https://rcs.ucalgary.ca/api.php?action=feedcontributions&user=Tthomas&feedformat=atomRCSWiki - User contributions [en]2024-03-28T18:42:11ZUser contributionsMediaWiki 1.39.6https://rcs.ucalgary.ca/index.php?title=Storage_Options&diff=2974Storage Options2023-11-15T13:27:51Z<p>Tthomas: academicfs -> researchfs name change</p>
<hr />
<div>There are a few options researchers can take advantage of when storing their research data. <br />
<br />
== Data Classification ==<br />
Please review the different data classifications that are outlined by the [https://ucalgary.ca/legal-services/sites/default/files/teams/1/Standards-Legal-Information-Security-Classification-Standard.pdf Information Security Classification Standard]. There are 4 levels of data classification which are summarized in the table below.<br />
<br />
{| class="wikitable"<br />
! Level<br />
! Description<br />
! Example<br />
|-<br />
| Level 1<br />
| Public<br />
|<br />
* Reference data sets<br />
* Published research data<br />
|-<br />
| Level 2<br />
| Internal<br />
|<br />
* Internal memos<br />
* Unpublished research data<br />
* Anonymized or de-identified human subject data<br />
* Library transactions and journals<br />
|-<br />
| Level 3<br />
| Confidential<br />
|<br />
* Faculty/staff employment applications, personnel files, contact information<br />
* Donor or prospective donor information<br />
* Contracts<br />
* Intellectual property<br />
|-<br />
| Level 4<br />
| Restricted<br />
|<br />
* Patient identifiable health information<br />
* identifiable human subject research data<br />
* information subject to special government requirements<br />
|}<br />
<br />
When selecting a storage option, you must use one that meets or exceeds the rated security classification.<br />
<br />
* See also the Collaboration, storage and file shares article in Service Now:<br />
: https://ucalgary.service-now.com/it?id=it_catalog_by_category&sys_id=4dbb82ee13661200c524fc04e144b044<br />
<br />
== Research Data Management ==<br />
We recommend you follow good Research Data Management practices and ensure you have a DMP (Data Management Plan) created to guide your data's lifecycle. DMP Assistant has been created specifically for Canadian scholars and aims to meet any and all Tri-Agency requirements. See: https://assistant.portagenetwork.ca/<br />
<br />
Your DMP can help us support the FAIR (findable, accessible, interoperable and reusable) principles for data management.<br />
<br />
Please consider contacting Libraries and Cultural Resources for assistance. For guidance on general data management and developing a DMP, consult https://library.ucalgary.ca/guides/researchdatamanagement or contact research.data@ucalgary.ca.<br />
<br />
For support using PRISM Dataverse, UofC's institutional data repository, contact digitize@ucalgary.ca.<br />
<br />
If you need to share and preserve your large post-publication data set for a mandated period of time, please visit https://www.frdr-dfdr.ca/repo/ in order to learn more about the national Federated Research Data Repository. <br />
<br />
FRDR aligns with Tri-Agency Principles as a platform for Preservation, Retention and Sharing of research data. see: [http://www.science.gc.ca/eic/site/063.nsf/eng/h_83F7624E.html Tri-Agency Statement of Principles on Digital Data Management]<br />
<br />
<br />
== Secure Compute Data Storage (SCDS) ==<br />
Secure Computing Data Storage (SCDS) is a service provided by Research Computing Services that allows researchers to store restricted and confidential data. Collaboration with Level 4 data stored in SCDS is possible using ShareFile, a secure file sharing and collaboration tool by Citrix.<br />
<br />
{| class="wikitable"<br />
! Capacity<br />
| 10 GB or more<br />
|-<br />
! Classification<br />
| Level 4<br />
|-<br />
! Learn More<br />
| Visit [https://it.ucalgary.ca/secure-computing-platform The SCDS Website]<br />
|-<br />
! Request Access<br />
| Visit [https://ucalgary.service-now.com/kb_view.do?sysparm_article=KB0030163 ServiceNow to request access]<br />
|}<br />
<br />
== ResearchFS ==<br />
ResearchFS is a UofC hosted SMB/CIFS storage solution funded and operated by RCS. It is available by request to faculty and staff with active research data.<br />
{| class="wikitable"<br />
! Capacity<br />
| 1TB with quota increases available on request. <br />
|-<br />
! Classification<br />
| Level 1 - 2<br />
|-<br />
! Request Access<br />
| Visit [https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=fe66b3a7db297300897e4b8b0b96199d ServiceNow to request access]<br />
|}<br />
=== Service Description ===<br />
You may use ResearchFS to store your active research data files. ResearchFS is intended to be used as a research group or project share. ResearchFS is available on campus or off campus using the IT supported VPN client. Information on how to download and install the VPN client can be found here: https://ucalgary.service-now.com/it?id=kb_article&sys_id=880e71071381ae006f3afbb2e144b05c (IT account login may be required).<br />
All ResearchFS users must have a UofC IT account.<br />
<br />
=== Data recovery ===<br />
ResearchFS does daily snapshots at a bit past midnight, which it keeps for 30 days. You should be able to recover a deleted file for up to 30 days, if it was in your share overnight. If you create a file and delete it during a day, no snapshot will be available for you to recover. ResearchFS presents backups using the windows OS 'previous versions' functionality. If you are not familiar with using this, or if you are on a Linux or MacOS device, you can request a restore, with Service Now.<br />
<br />
For backup, we replicate changes to a distant data center every hour. The storage hardware which hosts your data is located in the basement of the Math Sciences building and our backup is in the HRIC building, so in case of an on campus disaster, your data should be safe.<br />
<br />
=== Support for ResearchFS ===<br />
If you have questions, please contact the IT Support Centre.<br />
: Mon – Fri: 8:30 am – 5:00 pm; Sat, Sun & holidays: 10:00 am – 2:00 pm.<br />
: Live Chat: ucalgary.ca/it<br />
: Email: itsupport@ucalgary.ca<br />
: Phone: 403.220.5555<br />
: In person: 773 Math Science<br />
<br />
<br />
<br />
== OneDrive for Business ==<br />
OneDrive for Business is a storage solution provided by Microsoft and is available to all faculty and staff.<br />
{| class="wikitable"<br />
! Capacity<br />
| 5 TB with quota increases [https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=438e6d8313896a0053f2d7b2e144b0b9 available on request].<br />
|-<br />
! Classification<br />
| Level 1 - 4<br />
|}<br />
<br />
You may use OneDrive for Business to store your personal and work related files. <br />
Files stored within OneDrive are by default private only to you but has the option to allow sharing and collaboration with others. <br />
OneDrive for Business cannot be used as a department or project share space.<br />
There is no group/lab offering with OneDrive. <br />
<br />
While OneDrive provides a secure/compliant location from an IT Security stand point, <br />
it’s not the most adequate location for data the PI is accountable for 5 years upon completion of the study. <br />
This is not a security issue, but a data management issue.<br />
<br />
For example, if a study was using a personal OneDrive of one of the researchers to store all the records, <br />
and the researcher was to leave the university, this OneDrive would be gone in 30 days.<br />
<br />
MS has an automation capability for their O365 products.<br />
If you have a windows OS machine, you can use the automation product ‘Flow’ to copy a file to a local file system when a new file is created on OneDrive.<br />
<br />
To back up data residing on ARC to your personal OneDrive allocation please see: [[How to transfer data#rclone: rsync for cloud storage]]<br />
<br />
OneDrive requires Multi-Factor Authentication (MFA) enabled on your University of Calgary IT account. <br />
<br />
More information can be located in the following article https://ucalgary.service-now.com/kb_view.do?sysparm_article=KB0032351<br />
<br />
<br />
UofC OneDrive data is reportedly hosted in Canada (Markham Ont).<br />
<br />
===Support for OneDrive for Business===<br />
If you have questions, please contact the [https://ucalgary.service-now.com/it UService Support Centre].<br />
: Email:it@ucalgary.ca Phone:403.210.9300 1.888.342.3802 Mon - Fri: 8:00 a.m. to noon and 1:00 p.m. to 4:30 p.m. (closed over the lunch hour) Walk-in service Math Sciences 7th floor, Room 773 Tues - Thurs: 1:00 p.m. to 4:30 p.m.<br />
<br />
===Data recovery===<br />
<br />
===Other Resources===<br />
For more information on OneDrive for Business:<br />
* Operating Level of Agreement KB0032404 (https://ucalgary.service-now.com/it?id=kb_article&sys_id=7f57bddcdb56a3047cab5068dc9619b6)<br />
*OneDrive for Business Getting Started KB0032351 (https://ucalgary.service-now.com/it?id=kb_article&sys_id=60994170db2da7487cab5068dc961900)<br />
*If you are above 90% of your OneDrive quota, you can request an increase here: ( https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=438e6d8313896a0053f2d7b2e144b0b9) PLEASE NOTE: Microsoft will only increase an allocation while the Cloud Storage is more than 90% full. Please log into your O365 cloud account to review before making your request.<br />
<br />
Any questions regarding if data hosted on OneDrive is subject to US jurisdiction discovery or access should be directed to:<br />
*https://cumming.ucalgary.ca/research-institutes/csm-research-services/legal-research-services (CSM researchers.)<br />
*https://research.ucalgary.ca/contact/research-services (Not CSM Researchers)<br />
*https://www.ucalgary.ca/legalservices/ (for teaching/learning – non research enquiries that make their way to you)<br />
<br />
==Office365 SharePoint for research groups==<br />
<br />
To be determined....<br />
<br />
Researchers will be able to request an Office 365 SharePoint site for a group at some point in the future <br />
which could be considered a group cloud sharing platform.<br />
<br />
* The official service page:<br />
: https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=b55f2f72132f5240b5b4df82e144b085<br />
<br />
==Personal storage options==<br />
For personal or level 1 data, you may use an external solution from the Alliance.<br />
One has to have an Alliance account to use the service.<br />
This is similar to DropBox or Google drive functionality. <br />
<br />
*'''The Alliance NextCloud''':<br />
:https://nextcloud.computecanada.ca<br />
: 100 GB of storage that can be shared between your computers.<br />
: Alliance documentation: https://docs.alliancecan.ca/wiki/Nextcloud<br />
<br />
== Home Directories and Research Group Allocations on ARC ==<br />
<br />
ARC storage is used to support workflows on the ARC computing cluster. The expectation is that storage on ARC will only be used for active and upcoming computational projects. It is not suitable for long-term or archival storage as it is not backed-up and is not guaranteed to be available for the time periods that are typical of archiving.<br />
ARC is a research cluster, which means it has high performance but can be stopped for required maintenance when needed. <br />
Thus, ARC cannot be relied on for any kind of service that requires constant availability. <br />
Which means, in turn that ARC's storage cannot and should not be used as a main storage facility for research data. <br />
The master copy of research data should be stored elsewhere and only part of that data are expected to be copied to ARC for computational analysis.<br />
<br />
=== Home Directories ===<br />
Every user account on ARC has a static 500GB allocation of storage and a maximum of 1.5 million files (including directories). This cannot be increased or decreased. Home directory storage is connected via a network file system to the rest of the cluster and supports fast data transfer to memory on compute nodes. This also means that basic file system commands (like <code>ls</code>, <code>find</code>, and <code>du</code>) take longer to run as the number of files in your home directory increases. In particular, we strongly encourage users to stay under 100000 files if it is at all possible. This can be achieved by combining smaller data files into single larger files, using structured data formats rather than large number of text files, or combining collections of files that will be used together into archives (tar, dar, etc). Since top level permissions on home directories are set to prevent other users from reading or executing, home directories are not suitable for sharing data directly with colleagues working on ARC. A Research Group Allocation is a more appropriate place for storing shared data or very large data sets that will be used as part of active computational projects. <br />
<br />
=== Research Group Allocations (<code>/work</code> and <code>/bulk</code>) ===<br />
The principal investigator (PI) for a research group may request an extended shared allocation for the research group by contacting support@hpc.ucalgary.ca with answers to the following questions (please copy the full text of the questions into your email and write answers under it):<br />
<br />
* How much storage is requested and why is that the amount that you need? <br />
A rationale for a request can be a formal data management plan or something more informal like a rough estimate to the primary dataset used for a project and a rough estimate to the size of outputs expected from your computations that are planned to run on ARC over the '''next year.''' <br />
<br />
* What is the requested allocation name? (typically something like <PI name>_lab)<br />
* What is the data classification using the University of Calgary data security classification system?<br />
* Which user or users would be the owner of the allocation? (Full Name and UCalgary Email address, typically the requesting PI but there may be co-PIs)<br />
* Which members of the allocation should be able to request access for new users? (Full Name and UCalgary Email address for active ARC users) <br />
* What is the faculty of the owner or owners? <br />
* Please provide a short description of the lab or project that will use the allocation.<br />
<br />
<br />
'''Example 1''': "We will be processing a 1T dataset by performing 100 experimental runs. <br />
Each experiment will be processed to produce a 6GB output, giving 600GB of the total output data. <br />
We will also need 400GB additional space for post-processing and data management. <br />
Thus, we would like to request '''2TB''' of shared space in total." <br />
<br />
<br />
'''Example 2''': "3 members of our research group need additional shared space on ARC for their independent projects.<br />
Project 1 starts with 100GB of initial data and is expected to generate 800GB of the output results. <br />
Project 2 is going to use simulations and does not use any input data but is expected to generate 2TB of the simulated data for further processing.<br />
The processing will require 200GB of additional space.<br />
Project 3 will be working on a 1TB dataset and is expected to generate about 1TB of the output data. <br />
These projects, therefore, will require 5.1TB of storage. <br />
For convenience of data manipulation and management we would also like to have additional 400GB of extra storage space.<br />
Therefore, we would like to request '''5.5TB''' of shared storage space in total."<br />
<br />
<br />
Work and Bulk storage can be considerably larger than the home directory allocations. However, there are limits on what RCS can provide as ARC storage provides high-speed access and is expensive to purchase. Typically, '''any request over 10TB''' will require some discussion. Work and Bulk allocations differ in a few ways that influence how they are used. Work storage is faster to access as part of computational jobs on ARC although the impact is small for jobs that don't involve enormous numbers of reads. Bulk storage is designed to be a target for instrument data (which is typically processed in a way that reads data a small number of times per job) and is capable of mounting instruments elsewhere on campus using SMB. A number of questions come up frequently about Work and Bulk storage and these are addressed in an [[Group Storage Allocation FAQ | FAQ]].<br />
<br />
===== How to a add group member to the access list (<code>/work</code> and <code>/bulk</code>)?=====<br />
<br />
Any group member who wants to use the shared storage, should send an email to the support@hpc.ucalgary.ca to be added to the access group and CC the PI/ data owner. '''This will confirm that the PI approves the group member's request access to the shared storage.''' Please note that the access permissions inside the directory are expected to be managed by the data owners.<br />
<br />
<br />
[[Category:Administration]]<br />
{{Navbox Administration}}</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=Storage_Options&diff=2973Storage Options2023-11-15T13:25:19Z<p>Tthomas: </p>
<hr />
<div>There are a few options researchers can take advantage of when storing their research data. <br />
<br />
== Data Classification ==<br />
Please review the different data classifications that are outlined by the [https://ucalgary.ca/legal-services/sites/default/files/teams/1/Standards-Legal-Information-Security-Classification-Standard.pdf Information Security Classification Standard]. There are 4 levels of data classification which are summarized in the table below.<br />
<br />
{| class="wikitable"<br />
! Level<br />
! Description<br />
! Example<br />
|-<br />
| Level 1<br />
| Public<br />
|<br />
* Reference data sets<br />
* Published research data<br />
|-<br />
| Level 2<br />
| Internal<br />
|<br />
* Internal memos<br />
* Unpublished research data<br />
* Anonymized or de-identified human subject data<br />
* Library transactions and journals<br />
|-<br />
| Level 3<br />
| Confidential<br />
|<br />
* Faculty/staff employment applications, personnel files, contact information<br />
* Donor or prospective donor information<br />
* Contracts<br />
* Intellectual property<br />
|-<br />
| Level 4<br />
| Restricted<br />
|<br />
* Patient identifiable health information<br />
* identifiable human subject research data<br />
* information subject to special government requirements<br />
|}<br />
<br />
When selecting a storage option, you must use one that meets or exceeds the rated security classification.<br />
<br />
* See also the Collaboration, storage and file shares article in Service Now:<br />
: https://ucalgary.service-now.com/it?id=it_catalog_by_category&sys_id=4dbb82ee13661200c524fc04e144b044<br />
<br />
== Research Data Management ==<br />
We recommend you follow good Research Data Management practices and ensure you have a DMP (Data Management Plan) created to guide your data's lifecycle. DMP Assistant has been created specifically for Canadian scholars and aims to meet any and all Tri-Agency requirements. See: https://assistant.portagenetwork.ca/<br />
<br />
Your DMP can help us support the FAIR (findable, accessible, interoperable and reusable) principles for data management.<br />
<br />
Please consider contacting Libraries and Cultural Resources for assistance. For guidance on general data management and developing a DMP, consult https://library.ucalgary.ca/guides/researchdatamanagement or contact research.data@ucalgary.ca.<br />
<br />
For support using PRISM Dataverse, UofC's institutional data repository, contact digitize@ucalgary.ca.<br />
<br />
If you need to share and preserve your large post-publication data set for a mandated period of time, please visit https://www.frdr-dfdr.ca/repo/ in order to learn more about the national Federated Research Data Repository. <br />
<br />
FRDR aligns with Tri-Agency Principles as a platform for Preservation, Retention and Sharing of research data. see: [http://www.science.gc.ca/eic/site/063.nsf/eng/h_83F7624E.html Tri-Agency Statement of Principles on Digital Data Management]<br />
<br />
<br />
== Secure Compute Data Storage (SCDS) ==<br />
Secure Computing Data Storage (SCDS) is a service provided by Research Computing Services that allows researchers to store restricted and confidential data. Collaboration with Level 4 data stored in SCDS is possible using ShareFile, a secure file sharing and collaboration tool by Citrix.<br />
<br />
{| class="wikitable"<br />
! Capacity<br />
| 10 GB or more<br />
|-<br />
! Classification<br />
| Level 4<br />
|-<br />
! Learn More<br />
| Visit [https://it.ucalgary.ca/secure-computing-platform The SCDS Website]<br />
|-<br />
! Request Access<br />
| Visit [https://ucalgary.service-now.com/kb_view.do?sysparm_article=KB0030163 ServiceNow to request access]<br />
|}<br />
<br />
== AcademicFS ==<br />
AcademicFS is a UofC hosted SMB/CIFS storage solution funded and operated by RCS. It is available by request to faculty and staff with active research data.<br />
{| class="wikitable"<br />
! Capacity<br />
| 100GB with quota increases available on request. <br />
|-<br />
! Classification<br />
| Level 1 - 2<br />
|-<br />
! Request Access<br />
| Visit [https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=fe66b3a7db297300897e4b8b0b96199d ServiceNow to request access]<br />
|}<br />
=== Service Description ===<br />
You may use AcademicFS to store your active research data files. AcademicFS is intended to be used as a research group or project share. AcademicFS is available on campus or off campus using the IT supported VPN client. Information on how to download and install the VPN client can be found here: https://ucalgary.service-now.com/it?id=kb_article&sys_id=880e71071381ae006f3afbb2e144b05c (IT account login may be required).<br />
All AcademicFS users must have a UofC IT account.<br />
<br />
=== Data recovery ===<br />
AcademicFS does daily snapshots at a bit past midnight, which it keeps for 30 days. You should be able to recover a deleted file for up to 30 days, if it was in your share overnight. If you create a file and delete it during a day, no snapshot will be available for you to recover. AcademicFS presents backups using the windows OS 'previous versions' functionality. If you are not familiar with using this, or if you are on a Linux or MacOS device, you can request a restore, with Service Now.<br />
<br />
For backup, we replicate changes to a distant data center every hour. The storage hardware which hosts your data is located in the basement of the Math Sciences building and our backup is in the HRIC building, so in case of an on campus disaster, your data should be safe.<br />
<br />
=== Support for ResearchFS ===<br />
If you have questions, please contact the IT Support Centre.<br />
: Mon – Fri: 8:30 am – 5:00 pm; Sat, Sun & holidays: 10:00 am – 2:00 pm.<br />
: Live Chat: ucalgary.ca/it<br />
: Email: itsupport@ucalgary.ca<br />
: Phone: 403.220.5555<br />
: In person: 773 Math Science<br />
<br />
<br />
<br />
== OneDrive for Business ==<br />
OneDrive for Business is a storage solution provided by Microsoft and is available to all faculty and staff.<br />
{| class="wikitable"<br />
! Capacity<br />
| 5 TB with quota increases [https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=438e6d8313896a0053f2d7b2e144b0b9 available on request].<br />
|-<br />
! Classification<br />
| Level 1 - 4<br />
|}<br />
<br />
You may use OneDrive for Business to store your personal and work related files. <br />
Files stored within OneDrive are by default private only to you but has the option to allow sharing and collaboration with others. <br />
OneDrive for Business cannot be used as a department or project share space.<br />
There is no group/lab offering with OneDrive. <br />
<br />
While OneDrive provides a secure/compliant location from an IT Security stand point, <br />
it’s not the most adequate location for data the PI is accountable for 5 years upon completion of the study. <br />
This is not a security issue, but a data management issue.<br />
<br />
For example, if a study was using a personal OneDrive of one of the researchers to store all the records, <br />
and the researcher was to leave the university, this OneDrive would be gone in 30 days.<br />
<br />
MS has an automation capability for their O365 products.<br />
If you have a windows OS machine, you can use the automation product ‘Flow’ to copy a file to a local file system when a new file is created on OneDrive.<br />
<br />
To back up data residing on ARC to your personal OneDrive allocation please see: [[How to transfer data#rclone: rsync for cloud storage]]<br />
<br />
OneDrive requires Multi-Factor Authentication (MFA) enabled on your University of Calgary IT account. <br />
<br />
More information can be located in the following article https://ucalgary.service-now.com/kb_view.do?sysparm_article=KB0032351<br />
<br />
<br />
UofC OneDrive data is reportedly hosted in Canada (Markham Ont).<br />
<br />
===Support for OneDrive for Business===<br />
If you have questions, please contact the [https://ucalgary.service-now.com/it UService Support Centre].<br />
: Email:it@ucalgary.ca Phone:403.210.9300 1.888.342.3802 Mon - Fri: 8:00 a.m. to noon and 1:00 p.m. to 4:30 p.m. (closed over the lunch hour) Walk-in service Math Sciences 7th floor, Room 773 Tues - Thurs: 1:00 p.m. to 4:30 p.m.<br />
<br />
===Data recovery===<br />
<br />
===Other Resources===<br />
For more information on OneDrive for Business:<br />
* Operating Level of Agreement KB0032404 (https://ucalgary.service-now.com/it?id=kb_article&sys_id=7f57bddcdb56a3047cab5068dc9619b6)<br />
*OneDrive for Business Getting Started KB0032351 (https://ucalgary.service-now.com/it?id=kb_article&sys_id=60994170db2da7487cab5068dc961900)<br />
*If you are above 90% of your OneDrive quota, you can request an increase here: ( https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=438e6d8313896a0053f2d7b2e144b0b9) PLEASE NOTE: Microsoft will only increase an allocation while the Cloud Storage is more than 90% full. Please log into your O365 cloud account to review before making your request.<br />
<br />
Any questions regarding if data hosted on OneDrive is subject to US jurisdiction discovery or access should be directed to:<br />
*https://cumming.ucalgary.ca/research-institutes/csm-research-services/legal-research-services (CSM researchers.)<br />
*https://research.ucalgary.ca/contact/research-services (Not CSM Researchers)<br />
*https://www.ucalgary.ca/legalservices/ (for teaching/learning – non research enquiries that make their way to you)<br />
<br />
==Office365 SharePoint for research groups==<br />
<br />
To be determined....<br />
<br />
Researchers will be able to request an Office 365 SharePoint site for a group at some point in the future <br />
which could be considered a group cloud sharing platform.<br />
<br />
* The official service page:<br />
: https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=b55f2f72132f5240b5b4df82e144b085<br />
<br />
==Personal storage options==<br />
For personal or level 1 data, you may use an external solution from the Alliance.<br />
One has to have an Alliance account to use the service.<br />
This is similar to DropBox or Google drive functionality. <br />
<br />
*'''The Alliance NextCloud''':<br />
:https://nextcloud.computecanada.ca<br />
: 100 GB of storage that can be shared between your computers.<br />
: Alliance documentation: https://docs.alliancecan.ca/wiki/Nextcloud<br />
<br />
== Home Directories and Research Group Allocations on ARC ==<br />
<br />
ARC storage is used to support workflows on the ARC computing cluster. The expectation is that storage on ARC will only be used for active and upcoming computational projects. It is not suitable for long-term or archival storage as it is not backed-up and is not guaranteed to be available for the time periods that are typical of archiving.<br />
ARC is a research cluster, which means it has high performance but can be stopped for required maintenance when needed. <br />
Thus, ARC cannot be relied on for any kind of service that requires constant availability. <br />
Which means, in turn that ARC's storage cannot and should not be used as a main storage facility for research data. <br />
The master copy of research data should be stored elsewhere and only part of that data are expected to be copied to ARC for computational analysis.<br />
<br />
=== Home Directories ===<br />
Every user account on ARC has a static 500GB allocation of storage and a maximum of 1.5 million files (including directories). This cannot be increased or decreased. Home directory storage is connected via a network file system to the rest of the cluster and supports fast data transfer to memory on compute nodes. This also means that basic file system commands (like <code>ls</code>, <code>find</code>, and <code>du</code>) take longer to run as the number of files in your home directory increases. In particular, we strongly encourage users to stay under 100000 files if it is at all possible. This can be achieved by combining smaller data files into single larger files, using structured data formats rather than large number of text files, or combining collections of files that will be used together into archives (tar, dar, etc). Since top level permissions on home directories are set to prevent other users from reading or executing, home directories are not suitable for sharing data directly with colleagues working on ARC. A Research Group Allocation is a more appropriate place for storing shared data or very large data sets that will be used as part of active computational projects. <br />
<br />
=== Research Group Allocations (<code>/work</code> and <code>/bulk</code>) ===<br />
The principal investigator (PI) for a research group may request an extended shared allocation for the research group by contacting support@hpc.ucalgary.ca with answers to the following questions (please copy the full text of the questions into your email and write answers under it):<br />
<br />
* How much storage is requested and why is that the amount that you need? <br />
A rationale for a request can be a formal data management plan or something more informal like a rough estimate to the primary dataset used for a project and a rough estimate to the size of outputs expected from your computations that are planned to run on ARC over the '''next year.''' <br />
<br />
* What is the requested allocation name? (typically something like <PI name>_lab)<br />
* What is the data classification using the University of Calgary data security classification system?<br />
* Which user or users would be the owner of the allocation? (Full Name and UCalgary Email address, typically the requesting PI but there may be co-PIs)<br />
* Which members of the allocation should be able to request access for new users? (Full Name and UCalgary Email address for active ARC users) <br />
* What is the faculty of the owner or owners? <br />
* Please provide a short description of the lab or project that will use the allocation.<br />
<br />
<br />
'''Example 1''': "We will be processing a 1T dataset by performing 100 experimental runs. <br />
Each experiment will be processed to produce a 6GB output, giving 600GB of the total output data. <br />
We will also need 400GB additional space for post-processing and data management. <br />
Thus, we would like to request '''2TB''' of shared space in total." <br />
<br />
<br />
'''Example 2''': "3 members of our research group need additional shared space on ARC for their independent projects.<br />
Project 1 starts with 100GB of initial data and is expected to generate 800GB of the output results. <br />
Project 2 is going to use simulations and does not use any input data but is expected to generate 2TB of the simulated data for further processing.<br />
The processing will require 200GB of additional space.<br />
Project 3 will be working on a 1TB dataset and is expected to generate about 1TB of the output data. <br />
These projects, therefore, will require 5.1TB of storage. <br />
For convenience of data manipulation and management we would also like to have additional 400GB of extra storage space.<br />
Therefore, we would like to request '''5.5TB''' of shared storage space in total."<br />
<br />
<br />
Work and Bulk storage can be considerably larger than the home directory allocations. However, there are limits on what RCS can provide as ARC storage provides high-speed access and is expensive to purchase. Typically, '''any request over 10TB''' will require some discussion. Work and Bulk allocations differ in a few ways that influence how they are used. Work storage is faster to access as part of computational jobs on ARC although the impact is small for jobs that don't involve enormous numbers of reads. Bulk storage is designed to be a target for instrument data (which is typically processed in a way that reads data a small number of times per job) and is capable of mounting instruments elsewhere on campus using SMB. A number of questions come up frequently about Work and Bulk storage and these are addressed in an [[Group Storage Allocation FAQ | FAQ]].<br />
<br />
===== How to a add group member to the access list (<code>/work</code> and <code>/bulk</code>)?=====<br />
<br />
Any group member who wants to use the shared storage, should send an email to the support@hpc.ucalgary.ca to be added to the access group and CC the PI/ data owner. '''This will confirm that the PI approves the group member's request access to the shared storage.''' Please note that the access permissions inside the directory are expected to be managed by the data owners.<br />
<br />
<br />
[[Category:Administration]]<br />
{{Navbox Administration}}</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=Storage_Options&diff=2635Storage Options2023-09-14T16:26:00Z<p>Tthomas: /* OneDrive for Business */</p>
<hr />
<div>There are a few options researchers can take advantage of when storing their research data. <br />
<br />
== Data Classification ==<br />
Please review the different data classifications that are outlined by the [https://ucalgary.ca/legal-services/sites/default/files/teams/1/Standards-Legal-Information-Security-Classification-Standard.pdf Information Security Classification Standard]. There are 4 levels of data classification which are summarized in the table below.<br />
<br />
{| class="wikitable"<br />
! Level<br />
! Description<br />
! Example<br />
|-<br />
| Level 1<br />
| Public<br />
|<br />
* Reference data sets<br />
* Published research data<br />
|-<br />
| Level 2<br />
| Internal<br />
|<br />
* Internal memos<br />
* Unpublished research data<br />
* Anonymized or de-identified human subject data<br />
* Library transactions and journals<br />
|-<br />
| Level 3<br />
| Confidential<br />
|<br />
* Faculty/staff employment applications, personnel files, contact information<br />
* Donor or prospective donor information<br />
* Contracts<br />
* Intellectual property<br />
|-<br />
| Level 4<br />
| Restricted<br />
|<br />
* Patient identifiable health information<br />
* identifiable human subject research data<br />
* information subject to special government requirements<br />
|}<br />
<br />
When selecting a storage option, you must use one that meets or exceeds the rated security classification.<br />
<br />
* See also the Collaboration, storage and file shares article in Service Now:<br />
: https://ucalgary.service-now.com/it?id=it_catalog_by_category&sys_id=4dbb82ee13661200c524fc04e144b044<br />
<br />
== Research Data Management ==<br />
We recommend you follow good Research Data Management practices and ensure you have a DMP (Data Management Plan) created to guide your data's lifecycle. DMP Assistant has been created specifically for Canadian scholars and aims to meet any and all Tri-Agency requirements. See: https://assistant.portagenetwork.ca/<br />
<br />
Your DMP can help us support the FAIR (findable, accessible, interoperable and reusable) principles for data management.<br />
<br />
Please consider contacting Libraries and Cultural Resources for assistance. For guidance on general data management and developing a DMP, consult https://library.ucalgary.ca/guides/researchdatamanagement or contact research.data@ucalgary.ca.<br />
<br />
For support using PRISM Dataverse, UofC's institutional data repository, contact digitize@ucalgary.ca.<br />
<br />
If you need to share and preserve your large post-publication data set for a mandated period of time, please visit https://www.frdr-dfdr.ca/repo/ in order to learn more about the national Federated Research Data Repository. <br />
<br />
FRDR aligns with Tri-Agency Principles as a platform for Preservation, Retention and Sharing of research data. see: [http://www.science.gc.ca/eic/site/063.nsf/eng/h_83F7624E.html Tri-Agency Statement of Principles on Digital Data Management]<br />
<br />
<br />
== Secure Compute Data Storage (SCDS) ==<br />
Secure Computing Data Storage (SCDS) is a service provided by Research Computing Services that allows researchers to store restricted and confidential data. Collaboration with Level 4 data stored in SCDS is possible using ShareFile, a secure file sharing and collaboration tool by Citrix.<br />
<br />
{| class="wikitable"<br />
! Capacity<br />
| 10 GB or more<br />
|-<br />
! Classification<br />
| Level 4<br />
|-<br />
! Learn More<br />
| Visit [https://it.ucalgary.ca/secure-computing-platform The SCDS Website]<br />
|-<br />
! Request Access<br />
| Visit [https://ucalgary.service-now.com/kb_view.do?sysparm_article=KB0030163 ServiceNow to request access]<br />
|}<br />
<br />
== AcademicFS ==<br />
AcademicFS is a UofC hosted SMB/CIFS storage solution funded and operated by RCS. It is available by request to faculty and staff with active research data.<br />
{| class="wikitable"<br />
! Capacity<br />
| 100GB with quota increases available on request. <br />
|-<br />
! Classification<br />
| Level 1 - 2<br />
|-<br />
! Request Access<br />
| Visit [https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=fe66b3a7db297300897e4b8b0b96199d ServiceNow to request access]<br />
|}<br />
=== Service Description ===<br />
You may use AcademicFS to store your active research data files. AcademicFS is intended to be used as a research group or project share. AcademicFS is available on campus or off campus using the IT supported VPN client. Information on how to download and install the VPN client can be found here: https://ucalgary.service-now.com/it?id=kb_article&sys_id=880e71071381ae006f3afbb2e144b05c (IT account login may be required).<br />
All AcademicFS users must have a UofC IT account.<br />
<br />
=== Data recovery ===<br />
AcademicFS does daily snapshots at a bit past midnight, which it keeps for 30 days. You should be able to recover a deleted file for up to 30 days, if it was in your share overnight. If you create a file and delete it during a day, no snapshot will be available for you to recover. AcademicFS presents backups using the windows OS 'previous versions' functionality. If you are not familiar with using this, or if you are on a Linux or MacOS device, you can request a restore, with Service Now.<br />
<br />
For backup, we replicate changes to a distant data center every hour. The storage hardware which hosts your data is located in the basement of the Math Sciences building and our backup is in the HRIC building, so in case of an on campus disaster, your data should be safe.<br />
<br />
=== Support for AcademicFS ===<br />
If you have questions, please contact the IT Support Centre.<br />
: Mon – Fri: 8:30 am – 5:00 pm; Sat, Sun & holidays: 10:00 am – 2:00 pm.<br />
: Live Chat: ucalgary.ca/it<br />
: Email: itsupport@ucalgary.ca<br />
: Phone: 403.220.5555<br />
: In person: 773 Math Science<br />
<br />
<br />
<br />
== OneDrive for Business ==<br />
OneDrive for Business is a storage solution provided by Microsoft and is available to all faculty and staff.<br />
{| class="wikitable"<br />
! Capacity<br />
| 5 TB with quota increases [https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=438e6d8313896a0053f2d7b2e144b0b9 available on request].<br />
|-<br />
! Classification<br />
| Level 1 - 4<br />
|}<br />
<br />
You may use OneDrive for Business to store your personal and work related files. <br />
Files stored within OneDrive are by default private only to you but has the option to allow sharing and collaboration with others. <br />
OneDrive for Business cannot be used as a department or project share space.<br />
There is no group/lab offering with OneDrive. <br />
<br />
While OneDrive provides a secure/compliant location from an IT Security stand point, <br />
it’s not the most adequate location for data the PI is accountable for 5 years upon completion of the study. <br />
This is not a security issue, but a data management issue.<br />
<br />
For example, if a study was using a personal OneDrive of one of the researchers to store all the records, <br />
and the researcher was to leave the university, this OneDrive would be gone in 30 days.<br />
<br />
MS has an automation capability for their O365 products.<br />
If you have a windows OS machine, you can use the automation product ‘Flow’ to copy a file to a local file system when a new file is created on OneDrive.<br />
<br />
To back up data residing on ARC to your personal OneDrive allocation please see: [[How to transfer data#rclone: rsync for cloud storage]]<br />
<br />
OneDrive requires Multi-Factor Authentication (MFA) enabled on your University of Calgary IT account. <br />
<br />
More information can be located in the following article https://ucalgary.service-now.com/kb_view.do?sysparm_article=KB0032351<br />
<br />
<br />
UofC OneDrive data is reportedly hosted in Canada (Markham Ont).<br />
<br />
===Support for OneDrive for Business===<br />
If you have questions, please contact the [https://ucalgary.service-now.com/it UService Support Centre].<br />
: Email:it@ucalgary.ca Phone:403.210.9300 1.888.342.3802 Mon - Fri: 8:00 a.m. to noon and 1:00 p.m. to 4:30 p.m. (closed over the lunch hour) Walk-in service Math Sciences 7th floor, Room 773 Tues - Thurs: 1:00 p.m. to 4:30 p.m.<br />
<br />
===Data recovery===<br />
<br />
===Other Resources===<br />
For more information on OneDrive for Business:<br />
* Operating Level of Agreement KB0032404 (https://ucalgary.service-now.com/it?id=kb_article&sys_id=7f57bddcdb56a3047cab5068dc9619b6)<br />
*OneDrive for Business Getting Started KB0032351 (https://ucalgary.service-now.com/it?id=kb_article&sys_id=60994170db2da7487cab5068dc961900)<br />
*If you are above 90% of your OneDrive quota, you can request an increase here: ( https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=438e6d8313896a0053f2d7b2e144b0b9) PLEASE NOTE: Microsoft will only increase an allocation while the Cloud Storage is more than 90% full. Please log into your O365 cloud account to review before making your request.<br />
<br />
Any questions regarding if data hosted on OneDrive is subject to US jurisdiction discovery or access should be directed to:<br />
*https://cumming.ucalgary.ca/research-institutes/csm-research-services/legal-research-services (CSM researchers.)<br />
*https://research.ucalgary.ca/contact/research-services (Not CSM Researchers)<br />
*https://www.ucalgary.ca/legalservices/ (for teaching/learning – non research enquiries that make their way to you)<br />
<br />
==Office365 SharePoint for research groups==<br />
<br />
To be determined....<br />
<br />
Researchers will be able to request an Office 365 SharePoint site for a group at some point in the future <br />
which could be considered a group cloud sharing platform.<br />
<br />
* The official service page:<br />
: https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=b55f2f72132f5240b5b4df82e144b085<br />
<br />
==Personal storage options==<br />
For personal or level 1 data, you may use an external solution from the Alliance.<br />
One has to have an Alliance account to use the service.<br />
This is similar to DropBox or Google drive functionality. <br />
<br />
*'''The Alliance NextCloud''':<br />
:https://nextcloud.computecanada.ca<br />
: 100 GB of storage that can be shared between your computers.<br />
: Alliance documentation: https://docs.alliancecan.ca/wiki/Nextcloud<br />
<br />
== Home Directories and Research Group Allocations on ARC ==<br />
<br />
ARC storage is used to support workflows on the ARC computing cluster. The expectation is that storage on ARC will only be used for active and upcoming computational projects. It is not suitable for long-term or archival storage as it is not backed-up and is not guaranteed to be available for the time periods that are typical of archiving.<br />
ARC is a research cluster, which means it has high performance but can be stopped for required maintenance when needed. <br />
Thus, ARC cannot be relied on for any kind of service that requires constant availability. <br />
Which means, in turn that ARC's storage cannot and should not be used as a main storage facility for research data. <br />
The master copy of research data should be stored elsewhere and only part of that data are expected to be copied to ARC for computational analysis.<br />
<br />
=== Home Directories ===<br />
Every user account on ARC has a static 500GB allocation of storage and a maximum of 1.5 million files (including directories). This cannot be increased or decreased. Home directory storage is connected via a network file system to the rest of the cluster and supports fast data transfer to memory on compute nodes. This also means that basic file system commands (like <code>ls</code>, <code>find</code>, and <code>du</code>) take longer to run as the number of files in your home directory increases. In particular, we strongly encourage users to stay under 100000 files if it is at all possible. This can be achieved by combining smaller data files into single larger files, using structured data formats rather than large number of text files, or combining collections of files that will be used together into archives (tar, dar, etc). Since top level permissions on home directories are set to prevent other users from reading or executing, home directories are not suitable for sharing data directly with colleagues working on ARC. A Research Group Allocation is a more appropriate place for storing shared data or very large data sets that will be used as part of active computational projects. <br />
<br />
=== Research Group Allocations (<code>/work</code> and <code>/bulk</code>) ===<br />
The principal investigator (PI) for a research group may request an extended shared allocation for the research group by contacting support@hpc.ucalgary.ca with answers to the following questions (please copy the full text of the questions into your email and write answers under it):<br />
<br />
* How much storage is requested and why is that the amount that you need? <br />
A rationale for a request can be a formal data management plan or something more informal like a rough estimate to the primary dataset used for a project and a rough estimate to the size of outputs expected from your computations that are planned to run on ARC over the '''next year.''' <br />
<br />
* What is the requested allocation name? (typically something like <PI name>_lab)<br />
* What is the data classification using the University of Calgary data security classification system?<br />
* Which user or users would be the owner of the allocation? (Full Name and UCalgary Email address, typically the requesting PI but there may be co-PIs)<br />
* Which members of the allocation should be able to request access for new users? (Full Name and UCalgary Email address for active ARC users) <br />
* What is the faculty of the owner or owners? <br />
* Please provide a short description of the lab or project that will use the allocation.<br />
<br />
<br />
'''Example 1''': "We will be processing a 1T dataset by performing 100 experimental runs. <br />
Each experiment will be processed to produce a 6GB output, giving 600GB of the total output data. <br />
We will also need 400GB additional space for post-processing and data management. <br />
Thus, we would like to request '''2TB''' of shared space in total." <br />
<br />
<br />
'''Example 2''': "3 members of our research group need additional shared space on ARC for their independent projects.<br />
Project 1 starts with 100GB of initial data and is expected to generate 800GB of the output results. <br />
Project 2 is going to use simulations and does not use any input data but is expected to generate 2TB of the simulated data for further processing.<br />
The processing will require 200GB of additional space.<br />
Project 3 will be working on a 1TB dataset and is expected to generate about 1TB of the output data. <br />
These projects, therefore, will require 5.1TB of storage. <br />
For convenience of data manipulation and management we would also like to have additional 400GB of extra storage space.<br />
Therefore, we would like to request '''5.5TB''' of shared storage space in total."<br />
<br />
<br />
Work and Bulk storage can be considerably larger than the home directory allocations. However, there are limits on what RCS can provide as ARC storage provides high-speed access and is expensive to purchase. Typically, '''any request over 10TB''' will require some discussion. Work and Bulk allocations differ in a few ways that influence how they are used. Work storage is faster to access as part of computational jobs on ARC although the impact is small for jobs that don't involve enormous numbers of reads. Bulk storage is designed to be a target for instrument data (which is typically processed in a way that reads data a small number of times per job) and is capable of mounting instruments elsewhere on campus using SMB. A number of questions come up frequently about Work and Bulk storage and these are addressed in an [[Group Storage Allocation FAQ | FAQ]].<br />
[[Category:Guides]]<br />
<br />
===== How to a add group member to the access list (<code>/work</code> and <code>/bulk</code>)?=====<br />
<br />
Any group member who wants to use the shared storage, should send an email to the support@hpc.ucalgary.ca to be added to the access group and CC the PI/ data owner. '''This will confirm that the PI approves the group member's request access to the shared storage.''' Please note that the access permissions inside the directory are expected to be managed by the data owners.</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=How_to_transfer_data&diff=2621How to transfer data2023-09-12T21:19:59Z<p>Tthomas: /* Use Globus Web Application to transfer files */</p>
<hr />
<div>{{Message Box<br />
|title=Data Transfer Nodes<br />
|message=<br />
For performance and resource reasons, file transfers should be performed on the data transfer node arc-dtn.ucalgary.ca rather than on the the ARC login node. Since the ARC DTN has the same shares as ARC, any files you transfer to the DTN will also be available on ARC.<br />
}}<br />
<br />
<br />
= Command Line File Transfer Tools =<br />
You may use the following command-line file transfer utilities on Linux, MacOS, and Windows. File transfers using these methods require your computer to be on the University of Calgary campus network or via the University of Calgary IT General VPN.<br />
<br />
If you are working on a Windows computer, you will need to install these utilities separately as they are not installed by default. Newer versions of Windows 10 (1903 and up) have '''SSH''' built-in as part of the '''openssh''' package. However, you may be better off using one of the [[#GUI File Transfer]] tools listed in the following section.<br />
<br />
== <code>scp</code>: Secure Copy ==<br />
<code>scp</code> is a secure and encrypted method of transferring files between machines via SSH. It is available on Linux and Mac computers by default and can be installed on Windows by installing the OpenSSH package.<br />
<br />
The general format for the command is:<br />
$ scp [options] source destination<br />
<br />
* The <code>source</code> and <code>destination</code> fields can be a local file / directory or a remote one.<br />
* The ''local location'' is a normal Unix path, absolute or relative and <br />
* The ''remote location'' has a format <code>username@remote.system.name:file/path</code>.<br />
* The ''remote relative file path'' is relative to the home directory of the <code>username</code> on the remote system.<br />
<br />
You may see all the available options with <code>scp</code> by viewing the [http://man7.org/linux/man-pages/man1/scp.1.html man page].<br />
<br />
=== Example Usage ===<br />
Common operations are given below. On your desktop, to:<br />
* Transfer a single file (eg. <code>data.dat</code>) to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp data.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Transfer all files ending with <code>.dat</code> to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp *.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* To transfer an entire directory to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp -r my_data_directory/ username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
<br />
== <code>rsync</code> ==<br />
<code>rsync</code> is a utility for transferring and synchronizing files efficiently. The efficiency for its file synchronization is achieved by its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. <br />
<br />
<code>rsync</code> can be used to copy files and directories locally on a system or between two computers via SSH. Unlike <code>scp</code>. Because it is designed to synchronize two locations, partial transfers can be restarted by re-running <code>rsync</code> without losing progress. Resuming a partial transfer is not possible with <code>scp</code>.<br />
<br />
The general format for the command is similar to '''scp''':<br />
$ rsync [options] source destination<br />
* The <code>source</code> and <code>destination</code> fields can be a local file / directory or a remote one.<br />
* <code>rsync</code> '''cannot''' transfer files between '''two remote''' locations. The source or the destination must be a local path.<br />
* The ''local location'' is a normal Unix path, absolute or relative and <br />
* The ''remote location'' has a format <code>username@remote.system.name:file/path</code>.<br />
* The ''remote relative file path'' is relative to the home directory of the <code>username</code> on the remote system.<br />
<br />
You may see all the available options with <code>rsync</code> by viewing the [http://man7.org/linux/man-pages/man1/rsync.1.html man page].<br />
<br />
=== Example Usage ===<br />
Common operations are given below. On your desktop, to:<br />
<br />
* Upload a single file (eg. <code>data.dat</code>) from your workstation to your ARC: <syntaxhighlight lang="bash"><br />
desktop$ rsync -v data.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Upload all files matching a wildcard (eg. ending in <code>*.dat</code>): <syntaxhighlight lang="bash"><br />
$ rsync -v *.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Upload an entire directory (eg. <code>my_data</code> to <code>~/projects/project2</code>): <syntaxhighlight lang="bash"><br />
$ rsync -axv my_data username@arc-dtn.ucalgary.ca:~projects/project2/<br />
</syntaxhighlight><br />
* Upload more than one directory: <syntaxhighlight lang="bash"><br />
desktop$ rsync -axv my_data1 my_data2 my_data3 username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Download one file (eg. <code>output.dat</code>) from ARC to the current directory on your workstation: <syntaxhighlight lang="bash"><br />
## Note the '.' at the end of the command which references the current working directory on your computer<br />
desktop$ rsync -v username@arc-dtn.ucalgary.ca:projects/project1/output.dat .<br />
</syntaxhighlight><br />
* Download one directory (eg. <code>outputs</code>) from ARC to the current directory on your workstation:<syntaxhighlight lang="bash"><br />
desktop$ rsync -axv username@arc-dtn.ucalgary.ca:projects/project1/outputs .<br />
</syntaxhighlight><br />
<br />
== <code>sftp</code>: secure file transfer protocol ==<br />
<br />
* Manual page on-line: http://man7.org/linux/man-pages/man1/sftp.1.html<br />
<br />
<br />
'''sftp''' is a file transfer program, similar to '''ftp''', <br />
which performs all operations over an encrypted '''ssh''' transport. <br />
It may also use many features of '''ssh''', such as public key authentication and compression.<br />
<br />
<br />
'''sftp''' has an interactive mode, <br />
in which sftp understands a set of commands similar to those of '''ftp'''.<br />
Commands are case insensitive.<br />
<br />
== <code>rclone</code>: rsync for cloud storage ==<br />
'''Rclone''' is a command line program to sync files and directories to and from a number of on-line storage services.<br />
<br />
* https://rclone.org/<br />
You may backup your data from arc-dtn to your personal 5TB UCalgary OneDrive to create a safe second copy at a distance.<br />
<br />
[https://rcs.ucalgary.ca/images/8/8e/Rclone_and_OneDrive_on_arc.pdf detailed rclone configuration instructions]<br />
<br />
Please note, if you are syncing your OneDrive with a PC or Mac, your new backup of arc home may be auto-replicated to your computer. You may choose to not replicate using the PC or Mac OneDrive client (help & settings -> settings -> account -> Choose folders) .<br />
<br />
== <code>curl</code> and <code>wget</code>: downloading from the internet ==<br />
To download a file with the ability to resume a partial download:<br />
curl -c http://example.com/resource.tar.gz -O <br />
wget -c http://example.com/resource.tar.gz<br />
<br />
<br />
= Graphical File Transfer Tools =<br />
== FileZilla ==<br />
FileZilla is a free cross-platform file transfer program that can transfer files via FTP and SFTP. <br />
<br />
=== Installation ===<br />
You may obtain Filezilla from the project's official website at: https://filezilla-project.org/download.php?type=client. '''Please note''': The official installer may bundle ads and unwanted software. Be careful when clicking through.<br />
<br />
Alternatively, you may obtain Filezilla from Ninite: https://ninite.com/filezilla<br />
<br />
=== Connecting to ARC ===<br />
If working off campus, first connect to the University of Calgary General VPN. Open Filezilla and connect to <code>arc-dtn.ucalgary.ca</code> on port <code>22</code>.<br />
[[File:Filezilla.jpg|alt=Connecting to ARC using Filezilla|none|thumb|Connecting to ARC using Filezilla]] <br />
<br />
== MobaXterm (Windows) ==<br />
'''MobaXterm''' is the recommended tool for remote access and data transfer in '''Windows''' OSes.<br />
<br />
MobaXterm is a one-stop solution for most remote access work on a compute cluster or a Unix / Linux server.<br />
<br />
It provides many Unix like utilities for Windows including an '''SSH''' client and '''X11''' graphics server. It provides a graphical interface for <br />
data transfer operations.<br />
<br />
* Website: https://mobaxterm.mobatek.net/<br />
<br />
== WinSCP (Windows) ==<br />
WinSCP is a free Windows file transfer tool.<br />
<br />
https://winscp.net/eng/index.php<br />
<br />
= Cloud based Fire Transfer Services =<br />
== Globus File Transfer ==<br />
Globus File Transfer is a cloud based service for file transfer and file sharing. It uses GridFTP for high speed and reliable data transfers<br><br />
<br />
* Globus Docs: https://docs.globus.org/<br />
<br />
* The Alliance Docs on Globus: https://docs.alliancecan.ca/wiki/Globus<br />
<br />
* Globus How-Tos: https://docs.globus.org/how-to/<br />
<br />
=== How to get started ===<br />
<br />
# Email support@hpc.ucalgary.ca to request that your ucalgary account be added to the campus Globus Plus subscription.<br />
# Check your email client for a 'Welcome to Globus' message with current information to federate your @ucalgary identity with Globus services. <br />
<br />
=== Use Globus Web Application to transfer files ===<br />
<br />
# To initiate data transfers using the Globus Web Application, navigate to https://www.globusid.org/login and log into with your UCalgary account<br />
# On the left panel, click on File Manager to define/select the collection you wish to use. For example, to transfer data from ARC cluster (collection 1) to Compute Canada cedar cluster (collection 2) <br />
#* Under collection, for ARC data transfer node search for arc-dtn-collection. You will see it is listed with the following description: "Mapped Collection (GCS) on UCalgary ARC-DTN endpoint "<br />
#* from the drop down menu. Authenticate your access using UCalgary IT credentials. This will bring you to the home directory on ARC.<br />
#* Next, for Compute Canada cedar data transfer node choose 'collection 2' as 'computecanada#cedar-dtn' from the drop down menu. Again authenticate your access using Compute Canada credentials. Navigate to the location where you want to transfer the file.<br />
#* Select the file to be transferred from collection 1' and initiate the transfer process.<br />
<br />
=== Use Globus Web Application to Share Files ===<br />
You may grant access to a folder in your allocation on ARC to be used for uploading or sharing files with '''any individual with a globus account''', either through ComputeCanada or their own institution. <br />
<br />
Please see https://docs.globus.org/how-to/share-files/<br />
<br />
==== How it works ====<br />
<br />
'''Note''', that Globus sharing allows sharing data located on the ARC cluster with other people who '''do not possess an ARC account'''. <br />
<br />
<br />
For the '''external collaborator''' the required steps are:<br />
<br />
* The individual who needs access to the data on ARC can get a GlobusID individual account at https://globus.org . <br />
: It will be in the form of <code>globusid_name@globusid.org</code><br />
<br />
* If the individual already has access to '''Globus via another institution''', then that identity can be used. No need in personal Globus ID.<br />
<br />
<br />
For the '''data owner on ARC''':<br />
<br />
* Login to https://globus.org using UofC account..<br />
<br />
* Create a '''shared collection''' from a data directory on ARC.<br />
<br />
* Add the <code>globusid_name@globusid.org</code> to the access list to the shared collection, with either '''read-only''' or '''read-write''' permissions.<br />
: After that the collaborator will be able to access the shared collection using the individual '''GlobusID''' account.<br />
<br />
= Transferring Large Datasets =<br />
<br />
=== Using screen and rsync ===<br />
If you want to transfer a large amount of data from a remote Unix system to ARC you can use ''<code>rsync</code>'' to handle the transfer. However, you will have to keep your SSH session from your workstation connected during the entire transfer which is often not convenient or not feasible. <br />
<br />
To overcome this one can run the '''rsync''' transfer inside a '''screen''' virtual session on ARC. '''screen''' creates an SSH session local to ARC and allows for reconnection from SSH sessions from your workstation.<br />
<br />
To begin, login to ARC and start <code>screen</code> with the screen command:<br />
<pre><br />
# Login to ARC<br />
$ ssh username@arc.ucalgary.ca<br />
<br />
# Start a screen session<br />
$ screen<br />
<br />
# While in the new screen session, start the transfer with rsync.<br />
$ rsync -axv ext_user@external.system:path/to/remote/data .<br />
<br />
# Disconnect from the screen session with the hotkey 'Ctrl-a d'<br />
# You may now disconnect from ARC or close the lid of you laptop or turn off the computer.<br />
</pre><br />
<br />
To check if the transfer has been finished.<br />
<pre><br />
# Login to ARC<br />
$ ssh username@arc.ucalgary.ca<br />
<br />
# Reconnect to the screen session<br />
$ screen -r<br />
<br />
# If the transfer has been finished close the screen session.<br />
$ exit<br />
<br />
# If the transfer is still running, disconnect from the screen session with the hotkey 'Ctrl-a d'<br />
</pre><br />
<br />
=== Very large files ===<br />
If the files are large and the transfer speed is low the transfer may fail before the file has been transferred. <br />
'''rsync''' may not be of help here, as it will not restart the file transfer (have not tested recently).<br />
<br />
The solution may be to split the large file into smaller chunks, transfer them using rsync and then join them on the remote system (ARC for example):<br />
<br />
<pre><br />
# Large file is 506MB in this example.<br />
$ ls -l t.bin<br />
-rw-r--r-- 1 drozmano drozmano 530308481 Jun 8 11:06 t.bin<br />
<br />
# split the file:<br />
$ split -b 100M t.bin t.bin_chunk.<br />
<br />
# Check the chunks.<br />
$ ls -l t.bin_chunk.*<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.aa<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ab<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ac<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ad<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ae<br />
-rw-r--r-- 1 drozmano drozmano 6020481 Jun 8 11:09 t.bin_chunk.af<br />
<br />
# Transfer the files:<br />
$ rsync -axv t.bin_chunks.* username@arc.ucalgary.ca:<br />
</pre><br />
<br />
Then login to ARC and join the files:<br />
<pre><br />
$ cat t.bin_chunk.* > t.bin<br />
<br />
$ ls -l <br />
-rw-r--r-- 1 drozmano drozmano 530308481 Jun 8 11:06 t.bin<br />
</pre><br />
Success.<br />
<br />
[[Category:Guides]]</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=How_to_transfer_data&diff=2620How to transfer data2023-09-12T21:17:03Z<p>Tthomas: /* Use Globus Web Application to transfer files */</p>
<hr />
<div>{{Message Box<br />
|title=Data Transfer Nodes<br />
|message=<br />
For performance and resource reasons, file transfers should be performed on the data transfer node arc-dtn.ucalgary.ca rather than on the the ARC login node. Since the ARC DTN has the same shares as ARC, any files you transfer to the DTN will also be available on ARC.<br />
}}<br />
<br />
<br />
= Command Line File Transfer Tools =<br />
You may use the following command-line file transfer utilities on Linux, MacOS, and Windows. File transfers using these methods require your computer to be on the University of Calgary campus network or via the University of Calgary IT General VPN.<br />
<br />
If you are working on a Windows computer, you will need to install these utilities separately as they are not installed by default. Newer versions of Windows 10 (1903 and up) have '''SSH''' built-in as part of the '''openssh''' package. However, you may be better off using one of the [[#GUI File Transfer]] tools listed in the following section.<br />
<br />
== <code>scp</code>: Secure Copy ==<br />
<code>scp</code> is a secure and encrypted method of transferring files between machines via SSH. It is available on Linux and Mac computers by default and can be installed on Windows by installing the OpenSSH package.<br />
<br />
The general format for the command is:<br />
$ scp [options] source destination<br />
<br />
* The <code>source</code> and <code>destination</code> fields can be a local file / directory or a remote one.<br />
* The ''local location'' is a normal Unix path, absolute or relative and <br />
* The ''remote location'' has a format <code>username@remote.system.name:file/path</code>.<br />
* The ''remote relative file path'' is relative to the home directory of the <code>username</code> on the remote system.<br />
<br />
You may see all the available options with <code>scp</code> by viewing the [http://man7.org/linux/man-pages/man1/scp.1.html man page].<br />
<br />
=== Example Usage ===<br />
Common operations are given below. On your desktop, to:<br />
* Transfer a single file (eg. <code>data.dat</code>) to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp data.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Transfer all files ending with <code>.dat</code> to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp *.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* To transfer an entire directory to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp -r my_data_directory/ username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
<br />
== <code>rsync</code> ==<br />
<code>rsync</code> is a utility for transferring and synchronizing files efficiently. The efficiency for its file synchronization is achieved by its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. <br />
<br />
<code>rsync</code> can be used to copy files and directories locally on a system or between two computers via SSH. Unlike <code>scp</code>. Because it is designed to synchronize two locations, partial transfers can be restarted by re-running <code>rsync</code> without losing progress. Resuming a partial transfer is not possible with <code>scp</code>.<br />
<br />
The general format for the command is similar to '''scp''':<br />
$ rsync [options] source destination<br />
* The <code>source</code> and <code>destination</code> fields can be a local file / directory or a remote one.<br />
* <code>rsync</code> '''cannot''' transfer files between '''two remote''' locations. The source or the destination must be a local path.<br />
* The ''local location'' is a normal Unix path, absolute or relative and <br />
* The ''remote location'' has a format <code>username@remote.system.name:file/path</code>.<br />
* The ''remote relative file path'' is relative to the home directory of the <code>username</code> on the remote system.<br />
<br />
You may see all the available options with <code>rsync</code> by viewing the [http://man7.org/linux/man-pages/man1/rsync.1.html man page].<br />
<br />
=== Example Usage ===<br />
Common operations are given below. On your desktop, to:<br />
<br />
* Upload a single file (eg. <code>data.dat</code>) from your workstation to your ARC: <syntaxhighlight lang="bash"><br />
desktop$ rsync -v data.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Upload all files matching a wildcard (eg. ending in <code>*.dat</code>): <syntaxhighlight lang="bash"><br />
$ rsync -v *.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Upload an entire directory (eg. <code>my_data</code> to <code>~/projects/project2</code>): <syntaxhighlight lang="bash"><br />
$ rsync -axv my_data username@arc-dtn.ucalgary.ca:~projects/project2/<br />
</syntaxhighlight><br />
* Upload more than one directory: <syntaxhighlight lang="bash"><br />
desktop$ rsync -axv my_data1 my_data2 my_data3 username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Download one file (eg. <code>output.dat</code>) from ARC to the current directory on your workstation: <syntaxhighlight lang="bash"><br />
## Note the '.' at the end of the command which references the current working directory on your computer<br />
desktop$ rsync -v username@arc-dtn.ucalgary.ca:projects/project1/output.dat .<br />
</syntaxhighlight><br />
* Download one directory (eg. <code>outputs</code>) from ARC to the current directory on your workstation:<syntaxhighlight lang="bash"><br />
desktop$ rsync -axv username@arc-dtn.ucalgary.ca:projects/project1/outputs .<br />
</syntaxhighlight><br />
<br />
== <code>sftp</code>: secure file transfer protocol ==<br />
<br />
* Manual page on-line: http://man7.org/linux/man-pages/man1/sftp.1.html<br />
<br />
<br />
'''sftp''' is a file transfer program, similar to '''ftp''', <br />
which performs all operations over an encrypted '''ssh''' transport. <br />
It may also use many features of '''ssh''', such as public key authentication and compression.<br />
<br />
<br />
'''sftp''' has an interactive mode, <br />
in which sftp understands a set of commands similar to those of '''ftp'''.<br />
Commands are case insensitive.<br />
<br />
== <code>rclone</code>: rsync for cloud storage ==<br />
'''Rclone''' is a command line program to sync files and directories to and from a number of on-line storage services.<br />
<br />
* https://rclone.org/<br />
You may backup your data from arc-dtn to your personal 5TB UCalgary OneDrive to create a safe second copy at a distance.<br />
<br />
[https://rcs.ucalgary.ca/images/8/8e/Rclone_and_OneDrive_on_arc.pdf detailed rclone configuration instructions]<br />
<br />
Please note, if you are syncing your OneDrive with a PC or Mac, your new backup of arc home may be auto-replicated to your computer. You may choose to not replicate using the PC or Mac OneDrive client (help & settings -> settings -> account -> Choose folders) .<br />
<br />
== <code>curl</code> and <code>wget</code>: downloading from the internet ==<br />
To download a file with the ability to resume a partial download:<br />
curl -c http://example.com/resource.tar.gz -O <br />
wget -c http://example.com/resource.tar.gz<br />
<br />
<br />
= Graphical File Transfer Tools =<br />
== FileZilla ==<br />
FileZilla is a free cross-platform file transfer program that can transfer files via FTP and SFTP. <br />
<br />
=== Installation ===<br />
You may obtain Filezilla from the project's official website at: https://filezilla-project.org/download.php?type=client. '''Please note''': The official installer may bundle ads and unwanted software. Be careful when clicking through.<br />
<br />
Alternatively, you may obtain Filezilla from Ninite: https://ninite.com/filezilla<br />
<br />
=== Connecting to ARC ===<br />
If working off campus, first connect to the University of Calgary General VPN. Open Filezilla and connect to <code>arc-dtn.ucalgary.ca</code> on port <code>22</code>.<br />
[[File:Filezilla.jpg|alt=Connecting to ARC using Filezilla|none|thumb|Connecting to ARC using Filezilla]] <br />
<br />
== MobaXterm (Windows) ==<br />
'''MobaXterm''' is the recommended tool for remote access and data transfer in '''Windows''' OSes.<br />
<br />
MobaXterm is a one-stop solution for most remote access work on a compute cluster or a Unix / Linux server.<br />
<br />
It provides many Unix like utilities for Windows including an '''SSH''' client and '''X11''' graphics server. It provides a graphical interface for <br />
data transfer operations.<br />
<br />
* Website: https://mobaxterm.mobatek.net/<br />
<br />
== WinSCP (Windows) ==<br />
WinSCP is a free Windows file transfer tool.<br />
<br />
https://winscp.net/eng/index.php<br />
<br />
= Cloud based Fire Transfer Services =<br />
== Globus File Transfer ==<br />
Globus File Transfer is a cloud based service for file transfer and file sharing. It uses GridFTP for high speed and reliable data transfers<br><br />
<br />
* Globus Docs: https://docs.globus.org/<br />
<br />
* The Alliance Docs on Globus: https://docs.alliancecan.ca/wiki/Globus<br />
<br />
* Globus How-Tos: https://docs.globus.org/how-to/<br />
<br />
=== How to get started ===<br />
<br />
# Email support@hpc.ucalgary.ca to request that your ucalgary account be added to the campus Globus Plus subscription.<br />
# Check your email client for a 'Welcome to Globus' message with current information to federate your @ucalgary identity with Globus services. <br />
<br />
=== Use Globus Web Application to transfer files ===<br />
<br />
# To initiate data transfers using the Globus Web Application, navigate to https://www.globusid.org/login and log into with your UCalgary account<br />
# On the left panel, click on File Manager to define/select the collection you wish to use. For example, to transfer data from ARC cluster (collection 1) to Compute Canada cedar cluster (collection 2) <br />
#* Under collection, for ARC data transfer node search for arc-dtn-collection. You will see it is listed with the following description: "Mapped Collection (GCS) on UCalgary ARC-DTN endpoint "<br />
#* from the drop down menu. Authenticate your access using UCalgary IT credentials. This will bring you to the home directory on ARC.<br />
#* Next, for Compute Canada cedar data transfer node choose 'collection 2' as 'computecanada#cedar-dtn' from the drop down menu. Again authenticate your access using Compute Canada credentials. Navigate to the location where you want to transfer the file.<br />
#* Select the file to be transferred from collection 1' and initiate the transfer process.<br />
<br />
=== Use Globus Web Application to Share Files ===<br />
You may grant access to a folder in your allocation on ARC to be used for uploading or sharing files with '''any individual with a globus account''', either through ComputeCanada or their own institution. <br />
<br />
Please see https://docs.globus.org/how-to/share-files/<br />
<br />
==== How it works ====<br />
<br />
'''Note''', that Globus sharing allows sharing data located on the ARC cluster with other people who '''do not possess an ARC account'''. <br />
<br />
<br />
For the '''external collaborator''' the required steps are:<br />
<br />
* The individual who needs access to the data on ARC has to get a GlobusID individual account at https://globus.org . <br />
: It will be in the form of <code>globusid_name@globusid.org</code><br />
<br />
* If the individual already has access to '''Globus via another institution''', then that identity can be used. No need in personal Globus ID.<br />
<br />
<br />
For the '''data owner on ARC''':<br />
<br />
* Login to https://globus.org using UofC account..<br />
<br />
* Create a '''shared collection''' from a data directory on ARC.<br />
<br />
* Add the <code>globusid_name@globusid.org</code> to the access list to the shared collection, with either '''read-only''' or '''read-write''' permissions.<br />
: After that the collaborator will be able to access the shared collection using the individual '''GlobusID''' account.<br />
<br />
= Transferring Large Datasets =<br />
<br />
=== Using screen and rsync ===<br />
If you want to transfer a large amount of data from a remote Unix system to ARC you can use ''<code>rsync</code>'' to handle the transfer. However, you will have to keep your SSH session from your workstation connected during the entire transfer which is often not convenient or not feasible. <br />
<br />
To overcome this one can run the '''rsync''' transfer inside a '''screen''' virtual session on ARC. '''screen''' creates an SSH session local to ARC and allows for reconnection from SSH sessions from your workstation.<br />
<br />
To begin, login to ARC and start <code>screen</code> with the screen command:<br />
<pre><br />
# Login to ARC<br />
$ ssh username@arc.ucalgary.ca<br />
<br />
# Start a screen session<br />
$ screen<br />
<br />
# While in the new screen session, start the transfer with rsync.<br />
$ rsync -axv ext_user@external.system:path/to/remote/data .<br />
<br />
# Disconnect from the screen session with the hotkey 'Ctrl-a d'<br />
# You may now disconnect from ARC or close the lid of you laptop or turn off the computer.<br />
</pre><br />
<br />
To check if the transfer has been finished.<br />
<pre><br />
# Login to ARC<br />
$ ssh username@arc.ucalgary.ca<br />
<br />
# Reconnect to the screen session<br />
$ screen -r<br />
<br />
# If the transfer has been finished close the screen session.<br />
$ exit<br />
<br />
# If the transfer is still running, disconnect from the screen session with the hotkey 'Ctrl-a d'<br />
</pre><br />
<br />
=== Very large files ===<br />
If the files are large and the transfer speed is low the transfer may fail before the file has been transferred. <br />
'''rsync''' may not be of help here, as it will not restart the file transfer (have not tested recently).<br />
<br />
The solution may be to split the large file into smaller chunks, transfer them using rsync and then join them on the remote system (ARC for example):<br />
<br />
<pre><br />
# Large file is 506MB in this example.<br />
$ ls -l t.bin<br />
-rw-r--r-- 1 drozmano drozmano 530308481 Jun 8 11:06 t.bin<br />
<br />
# split the file:<br />
$ split -b 100M t.bin t.bin_chunk.<br />
<br />
# Check the chunks.<br />
$ ls -l t.bin_chunk.*<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.aa<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ab<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ac<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ad<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ae<br />
-rw-r--r-- 1 drozmano drozmano 6020481 Jun 8 11:09 t.bin_chunk.af<br />
<br />
# Transfer the files:<br />
$ rsync -axv t.bin_chunks.* username@arc.ucalgary.ca:<br />
</pre><br />
<br />
Then login to ARC and join the files:<br />
<pre><br />
$ cat t.bin_chunk.* > t.bin<br />
<br />
$ ls -l <br />
-rw-r--r-- 1 drozmano drozmano 530308481 Jun 8 11:06 t.bin<br />
</pre><br />
Success.<br />
<br />
[[Category:Guides]]</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=How_to_transfer_data&diff=2619How to transfer data2023-09-12T21:15:08Z<p>Tthomas: /* Use Globus Web Application to transfer files */</p>
<hr />
<div>{{Message Box<br />
|title=Data Transfer Nodes<br />
|message=<br />
For performance and resource reasons, file transfers should be performed on the data transfer node arc-dtn.ucalgary.ca rather than on the the ARC login node. Since the ARC DTN has the same shares as ARC, any files you transfer to the DTN will also be available on ARC.<br />
}}<br />
<br />
<br />
= Command Line File Transfer Tools =<br />
You may use the following command-line file transfer utilities on Linux, MacOS, and Windows. File transfers using these methods require your computer to be on the University of Calgary campus network or via the University of Calgary IT General VPN.<br />
<br />
If you are working on a Windows computer, you will need to install these utilities separately as they are not installed by default. Newer versions of Windows 10 (1903 and up) have '''SSH''' built-in as part of the '''openssh''' package. However, you may be better off using one of the [[#GUI File Transfer]] tools listed in the following section.<br />
<br />
== <code>scp</code>: Secure Copy ==<br />
<code>scp</code> is a secure and encrypted method of transferring files between machines via SSH. It is available on Linux and Mac computers by default and can be installed on Windows by installing the OpenSSH package.<br />
<br />
The general format for the command is:<br />
$ scp [options] source destination<br />
<br />
* The <code>source</code> and <code>destination</code> fields can be a local file / directory or a remote one.<br />
* The ''local location'' is a normal Unix path, absolute or relative and <br />
* The ''remote location'' has a format <code>username@remote.system.name:file/path</code>.<br />
* The ''remote relative file path'' is relative to the home directory of the <code>username</code> on the remote system.<br />
<br />
You may see all the available options with <code>scp</code> by viewing the [http://man7.org/linux/man-pages/man1/scp.1.html man page].<br />
<br />
=== Example Usage ===<br />
Common operations are given below. On your desktop, to:<br />
* Transfer a single file (eg. <code>data.dat</code>) to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp data.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Transfer all files ending with <code>.dat</code> to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp *.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* To transfer an entire directory to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp -r my_data_directory/ username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
<br />
== <code>rsync</code> ==<br />
<code>rsync</code> is a utility for transferring and synchronizing files efficiently. The efficiency for its file synchronization is achieved by its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. <br />
<br />
<code>rsync</code> can be used to copy files and directories locally on a system or between two computers via SSH. Unlike <code>scp</code>. Because it is designed to synchronize two locations, partial transfers can be restarted by re-running <code>rsync</code> without losing progress. Resuming a partial transfer is not possible with <code>scp</code>.<br />
<br />
The general format for the command is similar to '''scp''':<br />
$ rsync [options] source destination<br />
* The <code>source</code> and <code>destination</code> fields can be a local file / directory or a remote one.<br />
* <code>rsync</code> '''cannot''' transfer files between '''two remote''' locations. The source or the destination must be a local path.<br />
* The ''local location'' is a normal Unix path, absolute or relative and <br />
* The ''remote location'' has a format <code>username@remote.system.name:file/path</code>.<br />
* The ''remote relative file path'' is relative to the home directory of the <code>username</code> on the remote system.<br />
<br />
You may see all the available options with <code>rsync</code> by viewing the [http://man7.org/linux/man-pages/man1/rsync.1.html man page].<br />
<br />
=== Example Usage ===<br />
Common operations are given below. On your desktop, to:<br />
<br />
* Upload a single file (eg. <code>data.dat</code>) from your workstation to your ARC: <syntaxhighlight lang="bash"><br />
desktop$ rsync -v data.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Upload all files matching a wildcard (eg. ending in <code>*.dat</code>): <syntaxhighlight lang="bash"><br />
$ rsync -v *.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Upload an entire directory (eg. <code>my_data</code> to <code>~/projects/project2</code>): <syntaxhighlight lang="bash"><br />
$ rsync -axv my_data username@arc-dtn.ucalgary.ca:~projects/project2/<br />
</syntaxhighlight><br />
* Upload more than one directory: <syntaxhighlight lang="bash"><br />
desktop$ rsync -axv my_data1 my_data2 my_data3 username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Download one file (eg. <code>output.dat</code>) from ARC to the current directory on your workstation: <syntaxhighlight lang="bash"><br />
## Note the '.' at the end of the command which references the current working directory on your computer<br />
desktop$ rsync -v username@arc-dtn.ucalgary.ca:projects/project1/output.dat .<br />
</syntaxhighlight><br />
* Download one directory (eg. <code>outputs</code>) from ARC to the current directory on your workstation:<syntaxhighlight lang="bash"><br />
desktop$ rsync -axv username@arc-dtn.ucalgary.ca:projects/project1/outputs .<br />
</syntaxhighlight><br />
<br />
== <code>sftp</code>: secure file transfer protocol ==<br />
<br />
* Manual page on-line: http://man7.org/linux/man-pages/man1/sftp.1.html<br />
<br />
<br />
'''sftp''' is a file transfer program, similar to '''ftp''', <br />
which performs all operations over an encrypted '''ssh''' transport. <br />
It may also use many features of '''ssh''', such as public key authentication and compression.<br />
<br />
<br />
'''sftp''' has an interactive mode, <br />
in which sftp understands a set of commands similar to those of '''ftp'''.<br />
Commands are case insensitive.<br />
<br />
== <code>rclone</code>: rsync for cloud storage ==<br />
'''Rclone''' is a command line program to sync files and directories to and from a number of on-line storage services.<br />
<br />
* https://rclone.org/<br />
You may backup your data from arc-dtn to your personal 5TB UCalgary OneDrive to create a safe second copy at a distance.<br />
<br />
[https://rcs.ucalgary.ca/images/8/8e/Rclone_and_OneDrive_on_arc.pdf detailed rclone configuration instructions]<br />
<br />
Please note, if you are syncing your OneDrive with a PC or Mac, your new backup of arc home may be auto-replicated to your computer. You may choose to not replicate using the PC or Mac OneDrive client (help & settings -> settings -> account -> Choose folders) .<br />
<br />
== <code>curl</code> and <code>wget</code>: downloading from the internet ==<br />
To download a file with the ability to resume a partial download:<br />
curl -c http://example.com/resource.tar.gz -O <br />
wget -c http://example.com/resource.tar.gz<br />
<br />
<br />
= Graphical File Transfer Tools =<br />
== FileZilla ==<br />
FileZilla is a free cross-platform file transfer program that can transfer files via FTP and SFTP. <br />
<br />
=== Installation ===<br />
You may obtain Filezilla from the project's official website at: https://filezilla-project.org/download.php?type=client. '''Please note''': The official installer may bundle ads and unwanted software. Be careful when clicking through.<br />
<br />
Alternatively, you may obtain Filezilla from Ninite: https://ninite.com/filezilla<br />
<br />
=== Connecting to ARC ===<br />
If working off campus, first connect to the University of Calgary General VPN. Open Filezilla and connect to <code>arc-dtn.ucalgary.ca</code> on port <code>22</code>.<br />
[[File:Filezilla.jpg|alt=Connecting to ARC using Filezilla|none|thumb|Connecting to ARC using Filezilla]] <br />
<br />
== MobaXterm (Windows) ==<br />
'''MobaXterm''' is the recommended tool for remote access and data transfer in '''Windows''' OSes.<br />
<br />
MobaXterm is a one-stop solution for most remote access work on a compute cluster or a Unix / Linux server.<br />
<br />
It provides many Unix like utilities for Windows including an '''SSH''' client and '''X11''' graphics server. It provides a graphical interface for <br />
data transfer operations.<br />
<br />
* Website: https://mobaxterm.mobatek.net/<br />
<br />
== WinSCP (Windows) ==<br />
WinSCP is a free Windows file transfer tool.<br />
<br />
https://winscp.net/eng/index.php<br />
<br />
= Cloud based Fire Transfer Services =<br />
== Globus File Transfer ==<br />
Globus File Transfer is a cloud based service for file transfer and file sharing. It uses GridFTP for high speed and reliable data transfers<br><br />
<br />
* Globus Docs: https://docs.globus.org/<br />
<br />
* The Alliance Docs on Globus: https://docs.alliancecan.ca/wiki/Globus<br />
<br />
* Globus How-Tos: https://docs.globus.org/how-to/<br />
<br />
=== How to get started ===<br />
<br />
# Email support@hpc.ucalgary.ca to request that your ucalgary account be added to the campus Globus Plus subscription.<br />
# Check your email client for a 'Welcome to Globus' message with current information to federate your @ucalgary identity with Globus services. <br />
<br />
=== Use Globus Web Application to transfer files ===<br />
<br />
# To initiate data transfers using the Globus Web Application, navigate to https://www.globusid.org/login and log into with your UCalgary account<br />
# On the left panel, click on File Manager to define/select the collection you wish to use. For example, to transfer data from ARC cluster (collection 1) to Compute Canada cedar cluster (collection 2) <br />
#* Under collection, for ARC data transfer node search for arc-dtn-collection. You will see it is listed with the following description: "Mapped Collection on UCalgary ARC-DTN endpoint "<br />
#* from the drop down menu. Authenticate your access using UCalgary IT credentials. This will bring you to the home directory on ARC.<br />
#* Next, for Compute Canada cedar data transfer node choose 'collection 2' as 'computecanada#cedar-dtn' from the drop down menu. Again authenticate your access using Compute Canada credentials. Navigate to the location where you want to transfer the file.<br />
#* Select the file to be transferred from collection 1' and initiate the transfer process.<br />
<br />
=== Use Globus Web Application to Share Files ===<br />
You may grant access to a folder in your allocation on ARC to be used for uploading or sharing files with '''any individual with a globus account''', either through ComputeCanada or their own institution. <br />
<br />
Please see https://docs.globus.org/how-to/share-files/<br />
<br />
==== How it works ====<br />
<br />
'''Note''', that Globus sharing allows sharing data located on the ARC cluster with other people who '''do not possess an ARC account'''. <br />
<br />
<br />
For the '''external collaborator''' the required steps are:<br />
<br />
* The individual who needs access to the data on ARC has to get a GlobusID individual account at https://globus.org . <br />
: It will be in the form of <code>globusid_name@globusid.org</code><br />
<br />
* If the individual already has access to '''Globus via another institution''', then that identity can be used. No need in personal Globus ID.<br />
<br />
<br />
For the '''data owner on ARC''':<br />
<br />
* Login to https://globus.org using UofC account..<br />
<br />
* Create a '''shared collection''' from a data directory on ARC.<br />
<br />
* Add the <code>globusid_name@globusid.org</code> to the access list to the shared collection, with either '''read-only''' or '''read-write''' permissions.<br />
: After that the collaborator will be able to access the shared collection using the individual '''GlobusID''' account.<br />
<br />
= Transferring Large Datasets =<br />
<br />
=== Using screen and rsync ===<br />
If you want to transfer a large amount of data from a remote Unix system to ARC you can use ''<code>rsync</code>'' to handle the transfer. However, you will have to keep your SSH session from your workstation connected during the entire transfer which is often not convenient or not feasible. <br />
<br />
To overcome this one can run the '''rsync''' transfer inside a '''screen''' virtual session on ARC. '''screen''' creates an SSH session local to ARC and allows for reconnection from SSH sessions from your workstation.<br />
<br />
To begin, login to ARC and start <code>screen</code> with the screen command:<br />
<pre><br />
# Login to ARC<br />
$ ssh username@arc.ucalgary.ca<br />
<br />
# Start a screen session<br />
$ screen<br />
<br />
# While in the new screen session, start the transfer with rsync.<br />
$ rsync -axv ext_user@external.system:path/to/remote/data .<br />
<br />
# Disconnect from the screen session with the hotkey 'Ctrl-a d'<br />
# You may now disconnect from ARC or close the lid of you laptop or turn off the computer.<br />
</pre><br />
<br />
To check if the transfer has been finished.<br />
<pre><br />
# Login to ARC<br />
$ ssh username@arc.ucalgary.ca<br />
<br />
# Reconnect to the screen session<br />
$ screen -r<br />
<br />
# If the transfer has been finished close the screen session.<br />
$ exit<br />
<br />
# If the transfer is still running, disconnect from the screen session with the hotkey 'Ctrl-a d'<br />
</pre><br />
<br />
=== Very large files ===<br />
If the files are large and the transfer speed is low the transfer may fail before the file has been transferred. <br />
'''rsync''' may not be of help here, as it will not restart the file transfer (have not tested recently).<br />
<br />
The solution may be to split the large file into smaller chunks, transfer them using rsync and then join them on the remote system (ARC for example):<br />
<br />
<pre><br />
# Large file is 506MB in this example.<br />
$ ls -l t.bin<br />
-rw-r--r-- 1 drozmano drozmano 530308481 Jun 8 11:06 t.bin<br />
<br />
# split the file:<br />
$ split -b 100M t.bin t.bin_chunk.<br />
<br />
# Check the chunks.<br />
$ ls -l t.bin_chunk.*<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.aa<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ab<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ac<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ad<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ae<br />
-rw-r--r-- 1 drozmano drozmano 6020481 Jun 8 11:09 t.bin_chunk.af<br />
<br />
# Transfer the files:<br />
$ rsync -axv t.bin_chunks.* username@arc.ucalgary.ca:<br />
</pre><br />
<br />
Then login to ARC and join the files:<br />
<pre><br />
$ cat t.bin_chunk.* > t.bin<br />
<br />
$ ls -l <br />
-rw-r--r-- 1 drozmano drozmano 530308481 Jun 8 11:06 t.bin<br />
</pre><br />
Success.<br />
<br />
[[Category:Guides]]</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=How_to_transfer_data&diff=2618How to transfer data2023-09-12T21:14:40Z<p>Tthomas: /* How to get started */</p>
<hr />
<div>{{Message Box<br />
|title=Data Transfer Nodes<br />
|message=<br />
For performance and resource reasons, file transfers should be performed on the data transfer node arc-dtn.ucalgary.ca rather than on the the ARC login node. Since the ARC DTN has the same shares as ARC, any files you transfer to the DTN will also be available on ARC.<br />
}}<br />
<br />
<br />
= Command Line File Transfer Tools =<br />
You may use the following command-line file transfer utilities on Linux, MacOS, and Windows. File transfers using these methods require your computer to be on the University of Calgary campus network or via the University of Calgary IT General VPN.<br />
<br />
If you are working on a Windows computer, you will need to install these utilities separately as they are not installed by default. Newer versions of Windows 10 (1903 and up) have '''SSH''' built-in as part of the '''openssh''' package. However, you may be better off using one of the [[#GUI File Transfer]] tools listed in the following section.<br />
<br />
== <code>scp</code>: Secure Copy ==<br />
<code>scp</code> is a secure and encrypted method of transferring files between machines via SSH. It is available on Linux and Mac computers by default and can be installed on Windows by installing the OpenSSH package.<br />
<br />
The general format for the command is:<br />
$ scp [options] source destination<br />
<br />
* The <code>source</code> and <code>destination</code> fields can be a local file / directory or a remote one.<br />
* The ''local location'' is a normal Unix path, absolute or relative and <br />
* The ''remote location'' has a format <code>username@remote.system.name:file/path</code>.<br />
* The ''remote relative file path'' is relative to the home directory of the <code>username</code> on the remote system.<br />
<br />
You may see all the available options with <code>scp</code> by viewing the [http://man7.org/linux/man-pages/man1/scp.1.html man page].<br />
<br />
=== Example Usage ===<br />
Common operations are given below. On your desktop, to:<br />
* Transfer a single file (eg. <code>data.dat</code>) to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp data.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Transfer all files ending with <code>.dat</code> to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp *.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* To transfer an entire directory to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp -r my_data_directory/ username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
<br />
== <code>rsync</code> ==<br />
<code>rsync</code> is a utility for transferring and synchronizing files efficiently. The efficiency for its file synchronization is achieved by its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. <br />
<br />
<code>rsync</code> can be used to copy files and directories locally on a system or between two computers via SSH. Unlike <code>scp</code>. Because it is designed to synchronize two locations, partial transfers can be restarted by re-running <code>rsync</code> without losing progress. Resuming a partial transfer is not possible with <code>scp</code>.<br />
<br />
The general format for the command is similar to '''scp''':<br />
$ rsync [options] source destination<br />
* The <code>source</code> and <code>destination</code> fields can be a local file / directory or a remote one.<br />
* <code>rsync</code> '''cannot''' transfer files between '''two remote''' locations. The source or the destination must be a local path.<br />
* The ''local location'' is a normal Unix path, absolute or relative and <br />
* The ''remote location'' has a format <code>username@remote.system.name:file/path</code>.<br />
* The ''remote relative file path'' is relative to the home directory of the <code>username</code> on the remote system.<br />
<br />
You may see all the available options with <code>rsync</code> by viewing the [http://man7.org/linux/man-pages/man1/rsync.1.html man page].<br />
<br />
=== Example Usage ===<br />
Common operations are given below. On your desktop, to:<br />
<br />
* Upload a single file (eg. <code>data.dat</code>) from your workstation to your ARC: <syntaxhighlight lang="bash"><br />
desktop$ rsync -v data.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Upload all files matching a wildcard (eg. ending in <code>*.dat</code>): <syntaxhighlight lang="bash"><br />
$ rsync -v *.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Upload an entire directory (eg. <code>my_data</code> to <code>~/projects/project2</code>): <syntaxhighlight lang="bash"><br />
$ rsync -axv my_data username@arc-dtn.ucalgary.ca:~projects/project2/<br />
</syntaxhighlight><br />
* Upload more than one directory: <syntaxhighlight lang="bash"><br />
desktop$ rsync -axv my_data1 my_data2 my_data3 username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Download one file (eg. <code>output.dat</code>) from ARC to the current directory on your workstation: <syntaxhighlight lang="bash"><br />
## Note the '.' at the end of the command which references the current working directory on your computer<br />
desktop$ rsync -v username@arc-dtn.ucalgary.ca:projects/project1/output.dat .<br />
</syntaxhighlight><br />
* Download one directory (eg. <code>outputs</code>) from ARC to the current directory on your workstation:<syntaxhighlight lang="bash"><br />
desktop$ rsync -axv username@arc-dtn.ucalgary.ca:projects/project1/outputs .<br />
</syntaxhighlight><br />
<br />
== <code>sftp</code>: secure file transfer protocol ==<br />
<br />
* Manual page on-line: http://man7.org/linux/man-pages/man1/sftp.1.html<br />
<br />
<br />
'''sftp''' is a file transfer program, similar to '''ftp''', <br />
which performs all operations over an encrypted '''ssh''' transport. <br />
It may also use many features of '''ssh''', such as public key authentication and compression.<br />
<br />
<br />
'''sftp''' has an interactive mode, <br />
in which sftp understands a set of commands similar to those of '''ftp'''.<br />
Commands are case insensitive.<br />
<br />
== <code>rclone</code>: rsync for cloud storage ==<br />
'''Rclone''' is a command line program to sync files and directories to and from a number of on-line storage services.<br />
<br />
* https://rclone.org/<br />
You may backup your data from arc-dtn to your personal 5TB UCalgary OneDrive to create a safe second copy at a distance.<br />
<br />
[https://rcs.ucalgary.ca/images/8/8e/Rclone_and_OneDrive_on_arc.pdf detailed rclone configuration instructions]<br />
<br />
Please note, if you are syncing your OneDrive with a PC or Mac, your new backup of arc home may be auto-replicated to your computer. You may choose to not replicate using the PC or Mac OneDrive client (help & settings -> settings -> account -> Choose folders) .<br />
<br />
== <code>curl</code> and <code>wget</code>: downloading from the internet ==<br />
To download a file with the ability to resume a partial download:<br />
curl -c http://example.com/resource.tar.gz -O <br />
wget -c http://example.com/resource.tar.gz<br />
<br />
<br />
= Graphical File Transfer Tools =<br />
== FileZilla ==<br />
FileZilla is a free cross-platform file transfer program that can transfer files via FTP and SFTP. <br />
<br />
=== Installation ===<br />
You may obtain Filezilla from the project's official website at: https://filezilla-project.org/download.php?type=client. '''Please note''': The official installer may bundle ads and unwanted software. Be careful when clicking through.<br />
<br />
Alternatively, you may obtain Filezilla from Ninite: https://ninite.com/filezilla<br />
<br />
=== Connecting to ARC ===<br />
If working off campus, first connect to the University of Calgary General VPN. Open Filezilla and connect to <code>arc-dtn.ucalgary.ca</code> on port <code>22</code>.<br />
[[File:Filezilla.jpg|alt=Connecting to ARC using Filezilla|none|thumb|Connecting to ARC using Filezilla]] <br />
<br />
== MobaXterm (Windows) ==<br />
'''MobaXterm''' is the recommended tool for remote access and data transfer in '''Windows''' OSes.<br />
<br />
MobaXterm is a one-stop solution for most remote access work on a compute cluster or a Unix / Linux server.<br />
<br />
It provides many Unix like utilities for Windows including an '''SSH''' client and '''X11''' graphics server. It provides a graphical interface for <br />
data transfer operations.<br />
<br />
* Website: https://mobaxterm.mobatek.net/<br />
<br />
== WinSCP (Windows) ==<br />
WinSCP is a free Windows file transfer tool.<br />
<br />
https://winscp.net/eng/index.php<br />
<br />
= Cloud based Fire Transfer Services =<br />
== Globus File Transfer ==<br />
Globus File Transfer is a cloud based service for file transfer and file sharing. It uses GridFTP for high speed and reliable data transfers<br><br />
<br />
* Globus Docs: https://docs.globus.org/<br />
<br />
* The Alliance Docs on Globus: https://docs.alliancecan.ca/wiki/Globus<br />
<br />
* Globus How-Tos: https://docs.globus.org/how-to/<br />
<br />
=== How to get started ===<br />
<br />
# Email support@hpc.ucalgary.ca to request that your ucalgary account be added to the campus Globus Plus subscription.<br />
# Check your email client for a 'Welcome to Globus' message with current information to federate your @ucalgary identity with Globus services. <br />
<br />
=== Use Globus Web Application to transfer files ===<br />
<br />
# To initiate data transfers using the Globus Web Application, navigate to https://www.globusid.org/login and log into your Globus account <br />
# On the left panel, click on File Manager to define/select the collection you wish to use. For example, to transfer data from ARC cluster (collection 1) to Compute Canada cedar cluster (collection 2) <br />
#* Under collection, for ARC data transfer node search for arc-dtn-collection. You will see it is listed with the following description: "Mapped Collection on UCalgary ARC-DTN endpoint "<br />
#* from the drop down menu. Authenticate your access using UCalgary IT credentials. This will bring you to the home directory on ARC.<br />
#* Next, for Compute Canada cedar data transfer node choose 'collection 2' as 'computecanada#cedar-dtn' from the drop down menu. Again authenticate your access using Compute Canada credentials. Navigate to the location where you want to transfer the file.<br />
#* Select the file to be transferred from collection 1' and initiate the transfer process.<br />
<br />
=== Use Globus Web Application to Share Files ===<br />
You may grant access to a folder in your allocation on ARC to be used for uploading or sharing files with '''any individual with a globus account''', either through ComputeCanada or their own institution. <br />
<br />
Please see https://docs.globus.org/how-to/share-files/<br />
<br />
==== How it works ====<br />
<br />
'''Note''', that Globus sharing allows sharing data located on the ARC cluster with other people who '''do not possess an ARC account'''. <br />
<br />
<br />
For the '''external collaborator''' the required steps are:<br />
<br />
* The individual who needs access to the data on ARC has to get a GlobusID individual account at https://globus.org . <br />
: It will be in the form of <code>globusid_name@globusid.org</code><br />
<br />
* If the individual already has access to '''Globus via another institution''', then that identity can be used. No need in personal Globus ID.<br />
<br />
<br />
For the '''data owner on ARC''':<br />
<br />
* Login to https://globus.org using UofC account..<br />
<br />
* Create a '''shared collection''' from a data directory on ARC.<br />
<br />
* Add the <code>globusid_name@globusid.org</code> to the access list to the shared collection, with either '''read-only''' or '''read-write''' permissions.<br />
: After that the collaborator will be able to access the shared collection using the individual '''GlobusID''' account.<br />
<br />
= Transferring Large Datasets =<br />
<br />
=== Using screen and rsync ===<br />
If you want to transfer a large amount of data from a remote Unix system to ARC you can use ''<code>rsync</code>'' to handle the transfer. However, you will have to keep your SSH session from your workstation connected during the entire transfer which is often not convenient or not feasible. <br />
<br />
To overcome this one can run the '''rsync''' transfer inside a '''screen''' virtual session on ARC. '''screen''' creates an SSH session local to ARC and allows for reconnection from SSH sessions from your workstation.<br />
<br />
To begin, login to ARC and start <code>screen</code> with the screen command:<br />
<pre><br />
# Login to ARC<br />
$ ssh username@arc.ucalgary.ca<br />
<br />
# Start a screen session<br />
$ screen<br />
<br />
# While in the new screen session, start the transfer with rsync.<br />
$ rsync -axv ext_user@external.system:path/to/remote/data .<br />
<br />
# Disconnect from the screen session with the hotkey 'Ctrl-a d'<br />
# You may now disconnect from ARC or close the lid of you laptop or turn off the computer.<br />
</pre><br />
<br />
To check if the transfer has been finished.<br />
<pre><br />
# Login to ARC<br />
$ ssh username@arc.ucalgary.ca<br />
<br />
# Reconnect to the screen session<br />
$ screen -r<br />
<br />
# If the transfer has been finished close the screen session.<br />
$ exit<br />
<br />
# If the transfer is still running, disconnect from the screen session with the hotkey 'Ctrl-a d'<br />
</pre><br />
<br />
=== Very large files ===<br />
If the files are large and the transfer speed is low the transfer may fail before the file has been transferred. <br />
'''rsync''' may not be of help here, as it will not restart the file transfer (have not tested recently).<br />
<br />
The solution may be to split the large file into smaller chunks, transfer them using rsync and then join them on the remote system (ARC for example):<br />
<br />
<pre><br />
# Large file is 506MB in this example.<br />
$ ls -l t.bin<br />
-rw-r--r-- 1 drozmano drozmano 530308481 Jun 8 11:06 t.bin<br />
<br />
# split the file:<br />
$ split -b 100M t.bin t.bin_chunk.<br />
<br />
# Check the chunks.<br />
$ ls -l t.bin_chunk.*<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.aa<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ab<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ac<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ad<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ae<br />
-rw-r--r-- 1 drozmano drozmano 6020481 Jun 8 11:09 t.bin_chunk.af<br />
<br />
# Transfer the files:<br />
$ rsync -axv t.bin_chunks.* username@arc.ucalgary.ca:<br />
</pre><br />
<br />
Then login to ARC and join the files:<br />
<pre><br />
$ cat t.bin_chunk.* > t.bin<br />
<br />
$ ls -l <br />
-rw-r--r-- 1 drozmano drozmano 530308481 Jun 8 11:06 t.bin<br />
</pre><br />
Success.<br />
<br />
[[Category:Guides]]</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=CloudStack_End_User_Agreement&diff=2527CloudStack End User Agreement2023-06-14T19:11:32Z<p>Tthomas: </p>
<hr />
<div>==Introduction==<br />
CloudStack is an Infrastructure as a Service provided to University of Calgary researchers. It allows them to quickly deploy Virtual Machines to support their research.<br />
<br />
Researchers have a great degree of freedom in how they use CloudStack. They are solely responsible for maintaining any VMs they deploy and to ensure they backup any data and OS/Software configuration needed to recover or rebuild their VM.<br />
<br />
CloudStack is a research environment. While the system is available 24/7, support is only available during University business hours.<br />
<br />
Researchers are asked to follow the best practices listed below to ensure that the system remains available for the campus community.<br />
<br />
==Best Practices==<br />
<br />
# Please stay abreast of security updates for your OS and apply them.<br />
# Backup your data and software configuration to non-CloudStack hosted storage (RCS does not provide backups of VMs and their data).<br />
# Do not run Windows (This infrastructure is not licensed to run Windows).<br />
# The University's "Acceptable Use of Electronic Resources and Information Policy" applies to your work using a VM. Please see [https://www.ucalgary.ca/legal-services/university-policies-procedures/ University Policies and Procedures] and search for "Electronic".<br />
# This infrastructure is only rated to handle Level 1 and Level 2 data. Please see [https://www.ucalgary.ca/legal-services/university-legal-services/operating-standards-guidelines-forms/ University Legal Services] and select "Information Security Classification Standard" for details.<br />
# You are responsible for the appropriate use of the VM by any accounts you have created.<br />
# You should remove/disable accounts that are no longer required.<br />
# All user accounts on the VM must have good passwords. See [https://it.ucalgary.ca/it-security/passwords-do-i-have-change-them/ here] for details on creating strong passwords.<br />
# CloudStack is not meant as a High Performance Computing (HPC) number cruncher. If you have HPC needs, please see "[[How to get an account]]" for details on how to apply for an account on ARC. Not sure what you need? Contact us at [mailto:support@hpc.ucalgary.ca support@hpc.ucalgary.ca].<br />
# If your VM faces the outside world, please consider using appropriate security tools. Contact [mailto:support@hpc.ucalgary.ca support@hpc.ucalgary.ca] for assistance.<br />
<br />
==Important Notes==<br />
In the event of a security incident, IT Operations/Security will shut down affected VMs.<br />
<br />
Non-critical patches/upgrades to CloudStack that are required will happen on Tuesday of each week. Running VMs should not be affected.<br />
<br />
Non-critical patches/upgrades that require a complete restart of CloudStack will occur after 1 day email notice.<br />
<br />
Urgent security patches to CloudStack may happen with little to no notice.<br />
<br />
CloudStack is provided as-is, with best effort support. It is not suitable for mission critical, high availability services.<br />
<br />
Please check back with this document as changes may occur.</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=CloudStack_End_User_Agreement&diff=2526CloudStack End User Agreement2023-06-14T19:11:00Z<p>Tthomas: /* Introduction */</p>
<hr />
<div>==Introduction==<br />
CloudStack is an Infrastructure as a Service provided to University of Calgary researchers. It allows them to quickly deploy Virtual Machines to support their research.<br />
<br />
Researchers have a great degree of freedom in how they use CloudStack. They are solely responsible for maintaining any VMs they deploy and to ensure they backup any data and OS/Software configuration needed to recover their VM.<br />
<br />
CloudStack is a research environment. While the system is available 24/7, support is only available during University business hours.<br />
<br />
Researchers are asked to follow the best practices listed below to ensure that the system remains available for the campus community.<br />
<br />
==Best Practices==<br />
<br />
# Please stay abreast of security updates for your OS and apply them.<br />
# Backup your data and software configuration to non-CloudStack hosted storage (RCS does not provide backups of VMs and their data).<br />
# Do not run Windows (This infrastructure is not licensed to run Windows).<br />
# The University's "Acceptable Use of Electronic Resources and Information Policy" applies to your work using a VM. Please see [https://www.ucalgary.ca/legal-services/university-policies-procedures/ University Policies and Procedures] and search for "Electronic".<br />
# This infrastructure is only rated to handle Level 1 and Level 2 data. Please see [https://www.ucalgary.ca/legal-services/university-legal-services/operating-standards-guidelines-forms/ University Legal Services] and select "Information Security Classification Standard" for details.<br />
# You are responsible for the appropriate use of the VM by any accounts you have created.<br />
# You should remove/disable accounts that are no longer required.<br />
# All user accounts on the VM must have good passwords. See [https://it.ucalgary.ca/it-security/passwords-do-i-have-change-them/ here] for details on creating strong passwords.<br />
# CloudStack is not meant as a High Performance Computing (HPC) number cruncher. If you have HPC needs, please see "[[How to get an account]]" for details on how to apply for an account on ARC. Not sure what you need? Contact us at [mailto:support@hpc.ucalgary.ca support@hpc.ucalgary.ca].<br />
# If your VM faces the outside world, please consider using appropriate security tools. Contact [mailto:support@hpc.ucalgary.ca support@hpc.ucalgary.ca] for assistance.<br />
<br />
==Important Notes==<br />
In the event of a security incident, IT Operations/Security will shut down affected VMs.<br />
<br />
Non-critical patches/upgrades to CloudStack that are required will happen on Tuesday of each week. Running VMs should not be affected.<br />
<br />
Non-critical patches/upgrades that require a complete restart of CloudStack will occur after 1 day email notice.<br />
<br />
Urgent security patches to CloudStack may happen with little to no notice.<br />
<br />
CloudStack is provided as-is, with best effort support. It is not suitable for mission critical, high availability services.<br />
<br />
Please check back with this document as changes may occur.</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=CloudStack_End_User_Agreement&diff=2525CloudStack End User Agreement2023-06-14T18:51:29Z<p>Tthomas: /* Best Practices */</p>
<hr />
<div>==Introduction==<br />
CloudStack is an Infrastructure as a Service provided to University of Calgary researchers. It allows them to quickly deploy Virtual Machines to support their research.<br />
<br />
Researchers have a great degree of freedom in how they use CloudStack. They are solely responsible for maintaining any VMs they deploy.<br />
<br />
CloudStack is a research environment. While the system is available 24/7, support is only available during University business hours.<br />
<br />
Researchers are asked to follow the best practices listed below to ensure that the system remains available for the campus community.<br />
<br />
==Best Practices==<br />
<br />
# Please stay abreast of security updates for your OS and apply them.<br />
# Backup your data and software configuration to non-CloudStack hosted storage (RCS does not provide backups of VMs and their data).<br />
# Do not run Windows (This infrastructure is not licensed to run Windows).<br />
# The University's "Acceptable Use of Electronic Resources and Information Policy" applies to your work using a VM. Please see [https://www.ucalgary.ca/legal-services/university-policies-procedures/ University Policies and Procedures] and search for "Electronic".<br />
# This infrastructure is only rated to handle Level 1 and Level 2 data. Please see [https://www.ucalgary.ca/legal-services/university-legal-services/operating-standards-guidelines-forms/ University Legal Services] and select "Information Security Classification Standard" for details.<br />
# You are responsible for the appropriate use of the VM by any accounts you have created.<br />
# You should remove/disable accounts that are no longer required.<br />
# All user accounts on the VM must have good passwords. See [https://it.ucalgary.ca/it-security/passwords-do-i-have-change-them/ here] for details on creating strong passwords.<br />
# CloudStack is not meant as a High Performance Computing (HPC) number cruncher. If you have HPC needs, please see "[[How to get an account]]" for details on how to apply for an account on ARC. Not sure what you need? Contact us at [mailto:support@hpc.ucalgary.ca support@hpc.ucalgary.ca].<br />
# If your VM faces the outside world, please consider using appropriate security tools. Contact [mailto:support@hpc.ucalgary.ca support@hpc.ucalgary.ca] for assistance.<br />
<br />
==Important Notes==<br />
In the event of a security incident, IT Operations/Security will shut down affected VMs.<br />
<br />
Non-critical patches/upgrades to CloudStack that are required will happen on Tuesday of each week. Running VMs should not be affected.<br />
<br />
Non-critical patches/upgrades that require a complete restart of CloudStack will occur after 1 day email notice.<br />
<br />
Urgent security patches to CloudStack may happen with little to no notice.<br />
<br />
CloudStack is provided as-is, with best effort support. It is not suitable for mission critical, high availability services.<br />
<br />
Please check back with this document as changes may occur.</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=Storage_Options&diff=2519Storage Options2023-06-12T17:15:23Z<p>Tthomas: /* OneDrive for Business */</p>
<hr />
<div>There are a few options researchers can take advantage of when storing their research data. <br />
<br />
== Data Classification ==<br />
Please review the different data classifications that are outlined by the [https://ucalgary.ca/legal-services/sites/default/files/teams/1/Standards-Legal-Information-Security-Classification-Standard.pdf Information Security Classification Standard]. There are 4 levels of data classification which are summarized in the table below.<br />
<br />
{| class="wikitable"<br />
! Level<br />
! Description<br />
! Example<br />
|-<br />
| Level 1<br />
| Public<br />
|<br />
* Reference data sets<br />
* Published research data<br />
|-<br />
| Level 2<br />
| Internal<br />
|<br />
* Internal memos<br />
* Unpublished research data<br />
* Anonymized or de-identified human subject data<br />
* Library transactions and journals<br />
|-<br />
| Level 3<br />
| Confidential<br />
|<br />
* Faculty/staff employment applications, personnel files, contact information<br />
* Donor or prospective donor information<br />
* Contracts<br />
* Intellectual property<br />
|-<br />
| Level 4<br />
| Restricted<br />
|<br />
* Patient identifiable health information<br />
* identifiable human subject research data<br />
* information subject to special government requirements<br />
|}<br />
<br />
When selecting a storage option, you must use one that meets or exceeds the rated security classification.<br />
<br />
* See also the Collaboration, storage and file shares article in Service Now:<br />
: https://ucalgary.service-now.com/it?id=it_catalog_by_category&sys_id=4dbb82ee13661200c524fc04e144b044<br />
<br />
== Research Data Management ==<br />
We recommend you follow good Research Data Management practices and ensure you have a DMP (Data Management Plan) created to guide your data's lifecycle. DMP Assistant has been created specifically for Canadian scholars and aims to meet any and all Tri-Agency requirements. See: https://assistant.portagenetwork.ca/<br />
<br />
Your DMP can help us support the FAIR (findable, accessible, interoperable and reusable) principles for data management.<br />
<br />
Please consider contacting Libraries and Cultural Resources for assistance. For guidance on general data management and developing a DMP, consult https://library.ucalgary.ca/guides/researchdatamanagement or contact research.data@ucalgary.ca.<br />
<br />
For support using PRISM Dataverse, UofC's institutional data repository, contact digitize@ucalgary.ca.<br />
<br />
If you need to share and preserve your large post-publication data set for a mandated period of time, please visit https://www.frdr-dfdr.ca/repo/ in order to learn more about the national Federated Research Data Repository. <br />
<br />
FRDR aligns with Tri-Agency Principles as a platform for Preservation, Retention and Sharing of research data. see: [http://www.science.gc.ca/eic/site/063.nsf/eng/h_83F7624E.html Tri-Agency Statement of Principles on Digital Data Management]<br />
<br />
<br />
== Secure Compute Data Storage (SCDS) ==<br />
Secure Computing Data Storage (SCDS) is a service provided by Research Computing Services that allows researchers to store restricted and confidential data. Collaboration with Level 4 data stored in SCDS is possible using ShareFile, a secure file sharing and collaboration tool by Citrix.<br />
<br />
{| class="wikitable"<br />
! Capacity<br />
| 10 GB or more<br />
|-<br />
! Classification<br />
| Level 4<br />
|-<br />
! Learn More<br />
| Visit [https://it.ucalgary.ca/secure-computing-platform The SCDS Website]<br />
|-<br />
! Request Access<br />
| Visit [https://ucalgary.service-now.com/kb_view.do?sysparm_article=KB0030163 ServiceNow to request access]<br />
|}<br />
<br />
== AcademicFS ==<br />
AcademicFS is a UofC hosted SMB/CIFS storage solution funded and operated by RCS. It is available by request to faculty and staff with active research data.<br />
{| class="wikitable"<br />
! Capacity<br />
| 100GB with quota increases available on request. <br />
|-<br />
! Classification<br />
| Level 1 - 2<br />
|-<br />
! Request Access<br />
| Visit [https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=fe66b3a7db297300897e4b8b0b96199d ServiceNow to request access]<br />
|}<br />
=== Service Description ===<br />
You may use AcademicFS to store your active research data files. AcademicFS is intended to be used as a research group or project share. AcademicFS is available on campus or off campus using the IT supported VPN client. Information on how to download and install the VPN client can be found here: https://ucalgary.service-now.com/it?id=kb_article&sys_id=880e71071381ae006f3afbb2e144b05c (IT account login may be required).<br />
All AcademicFS users must have a UofC IT account.<br />
<br />
=== Data recovery ===<br />
AcademicFS does daily snapshots at a bit past midnight, which it keeps for 30 days. You should be able to recover a deleted file for up to 30 days, if it was in your share overnight. If you create a file and delete it during a day, no snapshot will be available for you to recover. AcademicFS presents backups using the windows OS 'previous versions' functionality. If you are not familiar with using this, or if you are on a Linux or MacOS device, you can request a restore, with Service Now.<br />
<br />
For backup, we replicate changes to a distant data center every hour. The storage hardware which hosts your data is located in the basement of the Math Sciences building and our backup is in the HRIC building, so in case of an on campus disaster, your data should be safe.<br />
<br />
=== Support for AcademicFS ===<br />
If you have questions, please contact the IT Support Centre.<br />
: Mon – Fri: 8:30 am – 5:00 pm; Sat, Sun & holidays: 10:00 am – 2:00 pm.<br />
: Live Chat: ucalgary.ca/it<br />
: Email: itsupport@ucalgary.ca<br />
: Phone: 403.220.5555<br />
: In person: 773 Math Science<br />
<br />
<br />
<br />
== OneDrive for Business ==<br />
OneDrive for Business is a storage solution provided by Microsoft and is available by request to all faculty and staff.<br />
{| class="wikitable"<br />
! Capacity<br />
| 5 TB with quota increases [https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=438e6d8313896a0053f2d7b2e144b0b9 available on request].<br />
|-<br />
! Classification<br />
| Level 1 - 4<br />
|-<br />
! Request Access<br />
| Visit [https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=522b68ebdb83e700897e4b8b0b961997 ServiceNow to request access]<br />
|}<br />
<br />
You may use OneDrive for Business to store your personal and work related files. <br />
Files stored within OneDrive are by default private only to you but has the option to allow sharing and collaboration with others. <br />
OneDrive for Business cannot be used as a department or project share space.<br />
There is no group/lab offering with OneDrive. <br />
<br />
<br />
<br />
While OneDrive provides a secure/compliant location from an IT Security stand point, <br />
it’s not the most adequate location for data the PI is accountable for 5 years upon completion of the study. <br />
This is not a security issue, but a data management issue.<br />
<br />
For example, if a study was using a personal OneDrive of one of the researchers to store all the records, <br />
and the researcher was to leave the university, this OneDrive would be gone in 30 days.<br />
<br />
MS has an automation capability for their O365 products.<br />
If you have a windows OS machine, you can use the automation product ‘Flow’ to copy a file to a local file system when a new file is created on OneDrive.<br />
<br />
To back up data residing on ARC to your personal OneDrive allocation please see: [[How to transfer data#rclone: rsync for cloud storage]]<br />
<br />
OneDrive requires Multi-Factor Authentication (MFA) enabled on your University of Calgary IT account. <br />
<br />
<br />
UofC OneDrive data is reportedly hosted in Canada (Markham Ont).<br />
<br />
===Request Access===<br />
To request for OneDrive for Business:<br />
#Submit your request on ServiceNow using the OneDrive for Business request form (https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=522b68ebdb83e700897e4b8b0b961997)<br />
#The IT Support Centre will contact you<br />
#Set a time with IT Support Centre to turn on MFA<br />
# Turn on MFA<br />
#Turn on OneDrive for Business<br />
<br />
===Support for OneDrive for Business===<br />
If you have questions, please contact the IT Support Centre.<br />
: Mon – Fri: 8:30 am – 5:00 pm; Sat, Sun & holidays: 10:00 am – 2:00 pm.<br />
:Live Chat: ucalgary.ca/it<br />
:Email: itsupport@ucalgary.ca<br />
: Phone: 403.220.5555<br />
:In person: 773 Math Science<br />
<br />
===Data recovery===<br />
<br />
===Other Resources===<br />
For more information on OneDrive for Business:<br />
* Operating Level of Agreement KB0032404 (https://ucalgary.service-now.com/it?id=kb_article&sys_id=7f57bddcdb56a3047cab5068dc9619b6)<br />
*OneDrive for Business Getting Started KB0032351 (https://ucalgary.service-now.com/it?id=kb_article&sys_id=60994170db2da7487cab5068dc961900)<br />
*If you are above 90% of your OneDrive quota, you can request an increase here: ( https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=438e6d8313896a0053f2d7b2e144b0b9) PLEASE NOTE: Microsoft will only increase an allocation while the Cloud Storage is more than 90% full. Please log into your O365 cloud account to review before making your request.<br />
<br />
Any questions regarding if data hosted on OneDrive is subject to US jurisdiction discovery or access should be directed to:<br />
*https://cumming.ucalgary.ca/research-institutes/csm-research-services/legal-research-services (CSM researchers.)<br />
*https://research.ucalgary.ca/contact/research-services (Not CSM Researchers)<br />
*https://www.ucalgary.ca/legalservices/ (for teaching/learning – non research enquiries that make their way to you)<br />
<br />
==Office365 SharePoint for research groups==<br />
<br />
To be determined....<br />
<br />
Researchers will be able to request an Office 365 SharePoint site for a group at some point in the future <br />
which could be considered a group cloud sharing platform.<br />
<br />
* The official service page:<br />
: https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=b55f2f72132f5240b5b4df82e144b085<br />
<br />
==Personal storage options==<br />
For personal or level 1 data, you may use an external solution from the Alliance.<br />
One has to have an Alliance account to use the service.<br />
This is similar to DropBox or Google drive functionality. <br />
<br />
*'''The Alliance NextCloud''':<br />
:https://nextcloud.computecanada.ca<br />
: 100 GB of storage that can be shared between your computers.<br />
: Alliance documentation: https://docs.alliancecan.ca/wiki/Nextcloud<br />
<br />
== Home Directories and Research Group Allocations on ARC ==<br />
<br />
ARC storage is used to support workflows on the ARC computing cluster. The expectation is that storage on ARC will only be used for active and upcoming computational projects. It is not suitable for long-term or archival storage as it is not backed-up and is not guaranteed to be available for the time periods that are typical of archiving.<br />
ARC is a research cluster, which means it has high performance but can be stopped for required maintenance when needed. <br />
Thus, ARC cannot be relied on for any kind of service that requires constant availability. <br />
Which means, in turn that ARC's storage cannot and should not be used as a main storage facility for research data. <br />
The master copy of research data should be stored elsewhere and only part of that data are expected to be copied to ARC for computational analysis.<br />
<br />
=== Home Directories ===<br />
Every user account on ARC has a static 500GB allocation of storage and a maximum of 1.5 million files (including directories). This cannot be increased or decreased. Home directory storage is connected via a network file system to the rest of the cluster and supports fast data transfer to memory on compute nodes. This also means that basic file system commands (like <code>ls</code>, <code>find</code>, and <code>du</code>) take longer to run as the number of files in your home directory increases. In particular, we strongly encourage users to stay under 100000 files if it is at all possible. This can be achieved by combining smaller data files into single larger files, using structured data formats rather than large number of text files, or combining collections of files that will be used together into archives (tar, dar, etc). Since top level permissions on home directories are set to prevent other users from reading or executing, home directories are not suitable for sharing data directly with colleagues working on ARC. A Research Group Allocation is a more appropriate place for storing shared data or very large data sets that will be used as part of active computational projects. <br />
<br />
=== Research Group Allocations (<code>/work</code> and <code>/bulk</code>) ===<br />
The principal investigator (PI) for a research group may request an extended shared allocation for the research group by contacting support@hpc.ucalgary.ca with answers to the following questions (please copy the full text of the questions into your email and write answers under it):<br />
<br />
* How much storage is requested and why is that the amount that you need? <br />
A rationale for a request can be a formal data management plan or something more informal like a rough estimate to the primary dataset used for a project and a rough estimate to the size of outputs expected from your computations that are planned to run on ARC over the '''next year.''' <br />
<br />
* What is the requested allocation name? (typically something like <PI name>_lab)<br />
* What is the data classification using the University of Calgary data security classification system?<br />
* Which user or users would be the owner of the allocation? (Full Name and UCalgary Email address, typically the requesting PI but there may be co-PIs)<br />
* Which members of the allocation should be able to request access for new users? (Full Name and UCalgary Email address for active ARC users) <br />
* What is the faculty of the owner or owners? <br />
* Please provide a short description of the lab or project that will use the allocation.<br />
<br />
<br />
'''Example 1''': "We will be processing a 3T dataset consisting of 1000 experimental runs. Each experiment will be processed to produce a 6GB output and we will need some further space for post-processing. We would like to request 12TB total." <br />
<br />
'''Example 2''': "Our research group has 5 members with separate projects. 3 have projects that will use 1TB of data and 2 have projects that will require 3TB of data. We would like to request 10TB total."<br />
<br />
<br />
Work and Bulk storage can be considerably larger than the home directory allocations. However, there are limits on what RCS can provide as ARC storage provides high-speed access and is expensive to purchase. Typically, '''any request over 10TB''' will require some discussion. Work and Bulk allocations differ in a few ways that influence how they are used. Work storage is faster to access as part of computational jobs on ARC although the impact is small for jobs that don't involve enormous numbers of reads. Bulk storage is designed to be a target for instrument data (which is typically processed in a way that reads data a small number of times per job) and is capable of mounting instruments elsewhere on campus using SMB. A number of questions come up frequently about Work and Bulk storage and these are addressed in an [[Group Storage Allocation FAQ | FAQ]].<br />
[[Category:Guides]]<br />
<br />
===== How to a add group member to the access list (<code>/work</code> and <code>/bulk</code>)?=====<br />
<br />
Any group member who wants to use the shared storage, should send an email to the support@hpc.ucalgary.ca to be added to the access group and CC the PI/ data owner. '''This will confirm that the PI approves the group member's request access to the shared storage.''' Please note that the access permissions inside the directory are expected to be managed by the data owners.</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=TALC_Cluster&diff=2440TALC Cluster2023-04-26T14:51:02Z<p>Tthomas: /* Selecting a partition */</p>
<hr />
<div>{{TALC Cluster Status}}{{Message Box<br />
|icon=Security Icon.png<br />
|title=Cybersecurity awareness at the U of C<br />
|message=Please note that there are typically about 950 phishing attempts targeting University of Calgary accounts each month. This is just a reminder to be careful about computer security issues, both at home and at the University. Please visit https://it.ucalgary.ca/it-security for more information, tips on secure computing, and how to report suspected security problems.}}<br />
<br />
This guide gives an overview of the Teaching and Learning Cluster (TALC) at the University of Calgary and is intended to be read by new account holders getting started on TALC. This guide covers topics as the hardware and performance characteristics, available software, usage policies and how to log in and run jobs. <br />
<br />
==Introduction==<br />
TALC is a cluster of computers created by Research Computing Services (RCS) in response to requests for a central computing resource to support academic courses and workshops offered at the University of Calgary. It is a complement to the Advanced Research Computing (ARC) cluster that is used for research, rather than educational purposes. The software environment in the TALC and ARC clusters is very similar and workflows between the two clusters are identical. What students learn about using TALC will have direct applicability to using ARC should they go on to use ARC for research work. <br />
<br />
If you are the instructor for a course that could benefit from using TALC, please review this guide and the [[TALC Terms of Use]] and then contact us at support@hpc.ucalgary.ca to discuss your requirements. <br />
<br />
Please note that in order to ensure that the appropriate software is available, student accounts are in place, and appropriate training has been provided for your teaching assistants, it is best to start this discussion several months prior to the start of the course.<br />
<br />
If you are a student in a course using TALC, please review this guide for basic instructions in using the cluster. Questions should first be directed to the teaching assistants or instructor for your course.<br />
<br />
===Obtaining an account===<br />
TALC account requests are expected to be submitted by the course instructor rather than from individual students. You must have a University of Calgary IT account in order to use TALC. If you do not have a University of IT account or email address, please register for one at https://itregport.ucalgary.ca/. In order to ensure TALC is provisioned in time for a course start date, the instructor should submit the initial list of @ucalgary.ca accounts needed for the course 2 weeks before the start date.<br />
<br />
=== Getting Support ===<br />
{{Message Box<br />
|title=Need Help or have other TALC Related Questions?<br />
|message='''Students''', please send TALC-related questions to your course instructor or teaching assistants.<br /><br />
'''Course instructors and TAs''', please report system issues to support@hpc.ucalgary.ca.<br />
|icon=Support Icon.png}}<br />
<br />
==Hardware==<br />
The TALC cluster is comprised of repurposed research clusters that are a few generations old. As a result, individual processor performance will not be comparable to the latest processors but should be sufficient for educational purposes and course work. <br />
{| class="wikitable"<br />
!Partition<br />
!Description<br />
!Nodes<br />
!CPU Cores, Model, and Year<br />
!Installed Memory<br />
!GPU<br />
!Network<br />
|-<br />
|gpu<br />
|GPU Compute<br />
|3<br />
|12 cores, 2x Intel Xeon Bronze 3204 CPU @ 1.90GHz (2019)<br />
|192 GB<br />
|5x NVIDIA Corporation TU104GL [Tesla T4]<br />
|40 Gbit/s InfiniBand<br />
|-<br />
|cpu16<br />
|General Purpose Compute<br />
|36<br />
|16 cores, 2x Eight-Core Intel Xeon CPU E5-2650 @ 2.00GHz (2012)<br />
|64 GB<br />
|N/A<br />
|40 Gbit/s InfiniBand<br />
|-<br />
|bigmem<br />
|General Purpose Compute<br />
|2<br />
|32 cores, 4x Intel(R) Xeon(R) CPU E7- 4830 @ 2.13GHz (2015)<br />
|1024 GB<br />
|N/A<br />
|40 Gbit/s InfiniBand<br />
|}<br />
<br />
===Storage===<br />
{{Message Box<br />
| title=No Backup Policy!<br />
| message=You are responsible for your own backups. Since accounts on TALC and related data are removed shortly after the associated course has finished, you should download anything you need to save to your own computer before the end of the course.<br />
}}<br />
<br />
TALC is connected to a network disk storage system. This storage is split across the <code>/home</code> and <code>/scratch</code> file systems. <br />
====<code>/home</code>: Home file system====<br />
Each user has a directory under /home and is the default working directory when logging in to TALC. Each home directory has a per-user quota of 500 GB. This limit is fixed and cannot be increased.<br />
<br />
Note on file sharing: Due to security concerns, permissions set using <code>chmod</code> on your home directory to allow other users to read/write to your home directory will be automatically reverted by an automated system process unless an explicit exception is made. If you need to share files with others on the TALC cluster, please write to support@hpc.ucalgary.ca to ask for such an exception.<br />
<br />
====<code>/scratch</code>: Scratch file system for large job-oriented storage====<br />
Associated with each job, under the <code>/scratch</code> directory, a subdirectory is created that can be referenced in job scripts as <code>/scratch/${SLURM_JOB_ID}</code>. You can use that directory for temporary files needed during the course of a job. Up to 30 TB of storage may be used, per user (total for all your jobs) in the <code>/scratch</code> file system. <br />
<br />
Data in <code>/scratch</code> associated with a given job will be deleted automatically, without exception, five days after the job finishes.<br />
<br />
== Software ==<br />
{{Message Box<br />
| title=Software Package Requests<br />
| message=Course instructors or teaching assistants should write to support@hpc.ucalgary.ca if additional software is required for their course.<br />
}}<br />
<br />
All TALC nodes run a version of Rocky Linux. For your convenience, we have packaged commonly used software packages and dependencies as modules available under <code>/global/software</code>. If your software package is not available as a module, you may also try Anaconda which allows users to manage and install custom packages in an isolated environment.<br />
<br />
For a list of available packages that have been made available, please see [[ARC Software pages]]. <br />
<br />
=== Modules ===<br />
The setup of the environment for using some of the installed software is through the <code>module</code> command.<br />
<br />
Software packages bundled as a module will be available under <code>/global/software</code> and can be listed with the <code>module avail</code> command.<br />
<syntaxhighlight lang="bash"><br />
$ module avail<br />
</syntaxhighlight><br />
<br />
To enable Python, load the Python module by running:<br />
<syntaxhighlight lang="bash"><br />
$ module load python/anaconda-3.6-5.1.0<br />
</syntaxhighlight><br />
<br />
To unload the Python module, run:<br />
<syntaxhighlight lang="bash"><br />
$ module remove python/anaconda-3.6-5.1.0<br />
</syntaxhighlight><br />
<br />
To see currently loaded modules, run:<br />
<syntaxhighlight lang="bash"><br />
$ module list<br />
</syntaxhighlight><br />
<br />
==Using TALC==<br />
{{Message Box<br />
|title=Usage subject to [[TALC Terms of Use]]<br />
|message=Please review the [[TALC Terms of Use]] prior to using TALC.<br />
|icon=Support Icon.png}}<br />
<br />
===Logging in===<br />
To log in to TALC, connect using SSH to talc.ucalgary.ca. Connections to TALC are accepted only from the University of Calgary network (on campus) or through the University of Calgary General VPN (off campus).<br />
<br />
When logging into a new TALC account for '''the first time''' the new user has to agree to the '''conditions of use''' for TALC. <br />
Until the conditions are accepted the account is not active.<br />
<br />
See [[Connecting to RCS HPC Systems]] for more information.<br />
<br />
===Working interactively===<br />
<!-- original chunk --><br />
TALC uses the Linux operating system. The program that responds to your typed commands and allows you to run other programs is called the Linux shell. There are several different shells available, but, by default you will use one called bash. It is useful to have some knowledge of the shell and a variety of other command-line programs that you can use to manipulate files. If you are new to Linux systems, we recommend that you work through one of the many online tutorials that are available, such as the [http://www.ee.surrey.ac.uk/Teaching/Unix/index.html UNIX Tutorial for Beginners (external link)] provided by the University of Surrey. The tutorial covers such fundamental topics, among others, as creating, renaming and deleting files and directories, how to produce a listing of your files and how to tell how much disk space you are using. For a more comprehensive introduction to Linux, see [http://linuxcommand.sourceforge.net/tlcl.php The Linux Command Line (external link)].<br />
<br />
The TALC login node may be used for such tasks as editing files, compiling programs and running short tests while developing programs. CPU intensive workloads on the login node should be restricted to under 15 minutes as per [[General Cluster Guidelines and Policies|our cluster guidelines]]. For interactive workloads exceeding 15 minutes, use the '''[[Running_jobs#Interactive_jobs|salloc command]]''' to allocate an interactive session on a compute node.<br />
<br />
The default <code>salloc</code> allocation is 1 CPU and 1 GB of memory. Adjust this by specifying <code>-n CPU#</code> and <code>--mem Megabytes</code>. You may request up to 5 hours of CPU time for interactive jobs.<br />
salloc --time 5:00:00 --partition cpu16<br />
<br />
===Running non-interactive jobs (batch processing)===<br />
Production runs and longer test runs should be submitted as (non-interactive) batch jobs, in which commands to be executed are listed in a script (text file). Batch jobs scripts are submitted using the <code>sbatch</code> command, part of the Slurm job management and scheduling software. #SBATCH directive lines at the beginning of the script are used to specify the resources needed for the job (cores, memory, run time limit and any specialized hardware needed).<br />
<br />
Most of the information on the Running Jobs page on the Compute Canada web site is also relevant for submitting and managing batch jobs and reserving processors for interactive work on TALC. One major difference between running jobs on the TALC and Compute Canada clusters is in selecting the type of hardware that should be used for a job. On TALC, you choose the hardware to use primarily by specifying a partition, as described below.<br />
<br />
===Using JupyterHub on Talc===<br />
TALC has a Jupyterhub server which runs a Jupyter server on one of the TALC compute nodes and provides all the necessary encryption and plumbing to deliver the notebook to your computer. To access this service you must have a TALC account. Point your browser at http://talc.ucalgary.ca and login with your usual UC account. As of this writing, the job that runs the jupyter notebook is 1 cpu and 10GiB of memory on a cpu16 node.<br />
<br />
<br />
'''Please note''' that before using the Jupyterhub on TALC a new user has to login into his/her TALC account using SSH at least once to '''accept the conditions of TALC use'''. <br />
Until the conditions are accepted the account is not activated and the Jupyterhub login will not work either.<br />
<br />
===Selecting a partition===<br />
TALC currently has the following partitions available for use. The <code>gpu</code> and <code>cpu12</code> partitions are refer to the same nodes. The <code>cpu12</code> partition was created to only expose the CPUs on the GPU hardware for general purpose use. Each GPU node has 5 Tesla T4 GPUs installed, but you may only request one per job within the TALC environment.<br />
{| class="wikitable"<br />
!Partition<br />
!Description<br />
!Nodes<br />
!Cores<br />
!Memory <br />
!Memory Request Limit<br />
!Time Limit<br />
!GPU Request per Job<br />
!Network<br />
|-<br />
|gpu<br />
|GPU Compute<br />
|3<br />
|12 cores<br />
|192 GB<br />
|190 GB<br />
|24 hours<br />
|1x NVIDIA Corporation TU104GL [Tesla T4]<br />
|40 Gbit/s InfiniBand<br />
|-<br />
|cpu12<br />
|General Purpose Compute<br />
|3<br />
|12 cores<br />
|192 GB<br />
|190 GB<br />
|24 hours<br />
|None<br />
|40 Gbit/s InfiniBand<br />
|-<br />
|cpu16<br />
|General Purpose Compute<br />
|36<br />
|16 cores<br />
|64 GB<br />
|62 GB<br />
|24 hours<br />
|None<br />
|40 Gbit/s InfiniBand<br />
|-<br />
|bigmem<br />
|General Purpose Compute<br />
|2<br />
|32 cores<br />
|1024 GB<br />
|1022 GB<br />
|24 hours<br />
|None<br />
|40 Gbit/s InfiniBand<br />
|}<br />
There are some aspects to consider when selecting a partition including:<br />
* Resource requirements in terms of memory and CPU cores<br />
* Hardware specific requirements, such as GPU or CPU Instruction Set Extensions<br />
* Partition resource limits and potential wait time<br />
* Software support for parallel processing using Message Passing Interface (MPI), OpenMP, etc. For example, MPI for parallel processing can distribute memory across multiple nodes, so that per-node memory requirements could be lower. Whereas OpenMP or single process serial code that is restricted to one node would require a higher memory node.<br />
<br />
Since resources that are requested are reserved for your job, please request only as much CPU and memory as your job requires to avoid reducing the cluster efficiency. If you are unsure which partition to use or the specific resource requests that are appropriate for your jobs, '''Course instructors and TAs''' may contact us at [mailto:support@hpc.ucalgary.ca support@hpc.ucalgary.ca] and we would be happy to work with you.<br />
<br />
=== Using a partition ===<br />
<br />
==== CPU only jobs ====<br />
To select the <code>cpu16</code> partition, include the following line in your batch job script:<syntaxhighlight lang="text"><br />
#SBATCH --partition=cpu16<br />
</syntaxhighlight>You may also start an interactive session with <code>salloc</code>:<syntaxhighlight lang="text"><br />
$ salloc --time 1:00:00 -p cpu16<br />
</syntaxhighlight><br />
<br />
==== GPU jobs ====<br />
In TALC, you are limited to exactly 1 GPU per job. Jobs that request for 0 GPUs or 2 or more GPUs will not be scheduled.<br />
<br />
To submit a job using the <code>gpu</code> partition with one GPU request, include the following to your batch job script:<syntaxhighlight lang="text"><br />
#SBATCH --partition=gpu<br />
#SBATCH --gpus-per-node=1<br />
</syntaxhighlight><br />
<br />
Like the previous example, you may also request interactive sessions with GPU nodes using <code>salloc</code>. Just specify the <code>gpu</code> partition and the number of GPUs required. <syntaxhighlight lang="text"><br />
$ salloc --time 1:00:00 -p gpu -n 1 --gpus-per-node 1 <br />
</syntaxhighlight>You may verify that a GPU was assigned to your job or interactive session by running <code>nvidia-smi</code>. This command will show you the status of the GPU that was assigned to you.<syntaxhighlight lang="text"><br />
$ nvidia-smi<br />
+-----------------------------------------------------------------------------+<br />
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |<br />
|-------------------------------+----------------------+----------------------+<br />
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |<br />
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |<br />
| | | MIG M. |<br />
|===============================+======================+======================|<br />
| 0 Tesla T4 Off | 00000000:3B:00.0 Off | 0 |<br />
| N/A 36C P0 14W / 70W | 0MiB / 15109MiB | 5% Default |<br />
| | | N/A |<br />
+-------------------------------+----------------------+----------------------+<br />
<br />
+-----------------------------------------------------------------------------+<br />
| Processes: |<br />
| GPU GI CI PID Type Process name GPU Memory |<br />
| ID ID Usage |<br />
|=============================================================================|<br />
| No running processes found |<br />
+-----------------------------------------------------------------------------+<br />
</syntaxhighlight><br />
<br />
==== Partition limitations ====<br />
In addition to the hardware limitations of the nodes within the partition, please be aware that there may also be policy limits imposed on your account for each partition. These limits restrict the number of cores, nodes, or GPUs that can be used at any given time. Since the limits are applied on a partition-by-partition basis, using resources in one partition should not affect the available resources you can use in another partition.<br />
<br />
These limits can be listed by running:<br />
<syntaxhighlight lang="bash"><br />
$ sacctmgr show qos format=Name,MaxWall,MaxTRESPU%20,MaxSubmitJobs<br />
Name MaxWall MaxTRESPU MaxSubmit <br />
---------- ----------- -------------------- --------- <br />
normal 1-00:00:00 <br />
cpulimit cpu=48 <br />
gpucpulim+ cpu=18 <br />
gpulimit cpu=2,gres/gpu=1 <br />
</syntaxhighlight><br />
<br />
=== Time limits ===<br />
Use the <code>--time</code> directive to tell the job scheduler the maximum time that your job might run. For example:<br />
#SBATCH --time=hh:mm:ss<br />
<br />
You can use <code>scontrol show partitions</code> or <code>sinfo</code> to see the current maximum time that a job can run.<br />
<syntaxhighlight lang="bash" highlight="6"><br />
$ scontrol show partitions<br />
PartitionName=cpu16<br />
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL<br />
AllocNodes=ALL Default=YES QoS=cpulimit<br />
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO<br />
MaxNodes=UNLIMITED MaxTime=1-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED<br />
Nodes=n[1-36]<br />
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO<br />
OverTimeLimit=NONE PreemptMode=OFF<br />
State=UP TotalCPUs=576 TotalNodes=36 SelectTypeParameters=NONE<br />
JobDefaults=(null)<br />
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED<br />
<br />
<br />
<br />
</syntaxhighlight><br />
<br />
Alternatively, with <code>sinfo</code> under the <code>TIMELIMIT</code> column:<br />
<syntaxhighlight lang="bash"><br />
$ sinfo<br />
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST<br />
cpu12 up 1-00:00:00 3 idle t[1-3]<br />
cpu16 up 1-00:00:00 36 idle n[1-36]<br />
bigmem up 1-00:00:00 2 idle bigmem[1-2]<br />
gpu up 1-00:00:00 3 idle t[1-3]<br />
<br />
...<br />
</syntaxhighlight><br />
[[Category:TALC]]<br />
[[Category:Guides]]</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Guide&diff=2439ARC Cluster Guide2023-04-26T13:48:48Z<p>Tthomas: cleaned up a link to the data classification standard</p>
<hr />
<div>{{ARC Cluster Status}}<br />
<br />
{{Message Box<br />
|title=[[Support|Need Help or have other ARC Related Questions?]]<br />
|message=For all general RCS related issues, questions, or comments, please contact us at support@hpc.ucalgary.ca.<br />
|icon=Support Icon.png}}<br />
<br />
This guide gives an overview of the Advanced Research Computing (ARC) cluster at the University of Calgary and is intended to be read by new account holders getting started on ARC. This guide covers topics such as the hardware and performance characteristics, available software, usage policies and how to log in and run jobs. ARC can be used with data that a Researcher has classified as Lv1 and Lv2 as described in the UCalgary [https://www.ucalgary.ca/legal-services/sites/default/files/teams/1/Standards-Legal-Information-Security-Classification-Standard.pdf Information Security Classification Standard] <br />
<br />
== Introduction ==<br />
The ARC is a high performance compute (HPC) cluster that is available for research projects based at the University of Calgary. This compute cluster is comprised of hundreds of severs interconnected with a high bandwidth interconnect. Special resources within the cluster include nodes with large memory installed and GPUs are also available. You may learn more about ARC's hardware in the [[ARC Cluster Guide#Hardware|hardware section below]]. ARC can be accessed through a [[Linux Introduction|command line interface]] or via a web interface called Open OnDemand.<br />
<br />
This cluster can be used for running large numbers (hundreds) of concurrent serial (one core) jobs, OpenMP or other thread-based jobs, shared-memory parallel code using up to 40 or 80 threads per job (depending on the partition), distributed-memory (MPI-based) parallel code using up to hundreds of cores, or jobs that take advantage of Graphics Processing Units (GPUs).<br />
<br />
Historically, ARC is primarily comprised of older, disparate Linux-based clusters that were formerly offered to researchers from across Canada such as Breezy, Lattice, and Parallel. In addition, a large-memory compute node (Bigbyte) was salvaged from the now-retired local Storm cluster. In January 2019, a major addition to ARC with modern hardware was purchased. In 2020, compute clusters from CHGI have been migrated into ARC.<br />
<br />
=== How to Get Started ===<br />
If you have a project you think would be appropriate for ARC, please email support@hpc.ucalgary.ca and mention the intended research and software you plan to use. You must have a University of Calgary IT account in order to use ARC.<br />
* For users that do not have a University of IT account or email address, please register for one at https://itregport.ucalgary.ca/.<br />
* For users external to the University, such as for users collaborating on a research project at the University of Calgary, please contact us and mention the project leader you are collaborating with.<br />
<br />
Once your access to ARC has been granted, you will be able to immediately make use of the cluster using your University of Calgary IT account by following the [[ARC_Cluster_Guide#Using_ARC|usage guide outlined below]].<br />
<br />
== Using ARC ==<br />
<br />
{{Message Box<br />
|icon=Security Icon.png<br />
|title=Cybersecurity awareness at the U of C<br />
|message=Please note that there are typically about 950 phishing attempts targeting University of Calgary accounts each month. This is just a reminder to be careful about computer security issues, both at home and at the University. Please visit https://it.ucalgary.ca/it-security for more information, tips on secure computing, and how to report suspected security problems.}}<br />
<br />
=== Logging in ===<br />
To log in to ARC, connect using SSH to <code>arc.ucalgary.ca</code> on port <code>22</code>. Connections to ARC are accepted only from the University of Calgary network (on campus) or through the University of Calgary General VPN (off campus).<br />
<br />
See [[Connecting to RCS HPC Systems]] for more information.<br />
=== How to interact with ARC ===<br />
<br />
ARC cluster is a collection of several compute nodes connected by a high-speed network. On ARC, computations get submitted as jobs. Once submitted, the jobs are then assigned to compute nodes by the job scheduler as resources become available.<br />
<br />
[[File:Cluster.png]]<br />
<br />
You can access ARC with your UCalgary IT user credentials. Once connected, you will get placed in the ARC login node, for basic tasks such as job submission, monitor job status, manage files, edit text, etc. It is a shared resource where multiple users get connected at the same time. Thus, any intensive tasks is not allowed on the login node as it may block other potential users to connect/submit their computations. <br />
[tannistha.nandi@arc ~]$ <br />
The job scheduling system on ARC is called SLURM. On ARC, there are two SLURM commands that can allocate resources to a job under appropriate conditions: ‘salloc’ and ‘sbatch’. They both accept the same set of command line options with respect to resource allocation. <br />
<br />
'''‘salloc’''' is to launch an interactive session, typically for tasks under 5 hours. <br />
Once an interactive job session is created, you can do things like explore research datasets, start R or python sessions to test your code, compile software applications etc.<br />
<br />
a. Example 1: The following command requests for 1 cpu on 1 node for 1 task along with 1 GB of RAM for an hour. <br />
[tannistha.nandi@arc ~]$ salloc --mem=1G -c 1 -N 1 -n 1 -t 01:00:00<br />
salloc: Granted job allocation 6758015<br />
salloc: Waiting for resource configuration<br />
salloc: Nodes fc4 are ready for job<br />
[tannistha.nandi@fc4 ~]$ <br />
<br />
<br />
b. Example 2: The following command requests for 1 GPU to be used from 1 node belonging to the gpu-v100 partition along with 1 GB of RAM for 1 hour. Generic resource scheduling (--gres) is used to request for GPU resources.<br />
[tannistha.nandi@arc ~]$ salloc --mem=1G -t 01:00:00 -p gpu-v100 --gres=gpu:1<br />
salloc: Granted job allocation 6760460<br />
salloc: Waiting for resource configuration<br />
salloc: Nodes fg3 are ready for job<br />
[tannistha.nandi@fg3 ~]$<br />
<br />
Once you finish the work, type 'exit' at the command prompt to end the interactive session,<br />
[tannistha.nandi@fg3 ~]$ exit<br />
[tannistha.nandi@fg3 ~]$ salloc: Relinquishing job allocation 6760460<br />
It is to ensure that the allocated resources are released from your job and now available to other users.<br />
<br />
'''‘sbatch’''' is to submit computations as jobs to run on the cluster. You can submit a job-script.slurm via 'sbatch' for execution. <br />
[tannistha.nandi@arc ~]$ sbatch job-script.slurm<br />
When resources become available, they get allocated to this task. Batch jobs are suited for tasks that run for long periods of time without any user supervision. When the job-script terminates, the allocation is released. <br />
Please review the section on how to prepare job scripts for more information.<br />
<br />
=== Prepare job scripts ===<br />
Job scripts are text files saved with an extension '.slurm', for example, 'job-script.slurm'. <br />
A job script looks something like this:<br />
''#!/bin/bash''<br />
####### Reserve computing resources #############<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks=1<br />
#SBATCH --cpus-per-task=1<br />
#SBATCH --time=01:00:00<br />
#SBATCH --mem=1G<br />
#SBATCH --partition=cpu2019<br><br />
####### Set environment variables ###############<br />
module load python/anaconda3-2018.12<br><br />
####### Run your script #########################<br />
python myscript.py<br />
<br />
The first line contains the text "#!/bin/bash" to interpret it as a bash script.<br />
<br />
It is followed by lines that start with a '#SBATCH' to communicate with 'SLURM'. You may add as many #SBATCH directives as needed to reserve computing resources for your task. The above example requests for one cpu on a single node for 1 task along with 1GB RAM for an hour on cpu2019 partition.<br />
<br />
Next, you have to set up environment variables either by loading the modules centrally installed on ARC or export path to the software in your home directory. The above example loads an available python module.<br />
<br />
Finally, include the Linux command to execute the local script.<br />
<br />
Note that failing to specify part of a resource allocation request (most notably '''time''' and '''memory''') will result in bad resource requests as the defaults are not appropriate to most cases. Please refer to the section 'Running non-interactive jobs' for more examples.<br />
<br />
== Hardware ==<br />
Since the ARC cluster is a conglomeration of many different compute clusters, the hardware within ARC can vary widely in terms of performance and capabilities. To mitigate any compatibility issues with different hardware, we combine similar hardware into their own Slurm partition to ensure your workload runs as consistently as possible within one partition. Please carefully review the hardware specs for each of the partitions below to avoid any surprises.<br />
<br />
=== Partition Hardware Specs ===<br />
When submitting jobs to ARC, you may specify a partition that your job will run on. Please choose a partition that is most appropriate for your work.<br />
<br />
* See also [[How to find available partitions on ARC]].<br />
<br />
A few things to keep in mind when choosing a partition:<br />
* Specific workloads requiring special Intel Instruction Set Extensions may only work on newer Intel CPUs. <br />
* If working with multi-node parallel processing, ensure your software and libraries support the partition's interconnect networking.<br />
* While older partitions may be slower, they may be less busy and have little to no wait times.<br />
<br />
If you are unsure which partition to use or need assistance on selecting an appropriate partition, please see [[#Selecting_a_Partition|the Selecting a Partition Section]] below. <br />
<br />
{| class="wikitable"<br />
! Partition<br />
! Description<br />
! Nodes<br />
! CPU Cores, Model, and Year<br />
! Memory<br />
! GPU<br />
! Network<br />
|-<br />
| -<br />
| ARC Login Node<br />
| 1<br />
| 16 cores, 2x Intel(R) Xeon(R) CPU E5620 @ 2.40GHz (Westmere, 2010)<br />
| 48 GB<br />
| N/A<br />
| 40 Gbit/s InfiniBand<br />
|-<br />
| gpu-v100<br />
| GPU Parition<br />
| 13<br />
| 80 cores, 2x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (Skylake, 2019)<br />
| 754 GB<br />
| 2x Tesla V100-PCIE-16GB<br />
| 100 Gbit/s Omni-Path<br />
|-<br />
|gpu-a100<br />
|GPU Partition<br />
|5<br />
|40 cores, 1x Intel(R) Xeon(R) Silver 4316 CPU @ 2.30GHz (Ice Lake, 2021)<br />
|512 GB<br />
|2x GA100 A100 PCIe 80GB<br />
|100 Gbit/s Mellanox Infiniband<br />
|-<br />
| cpu2021<br />
| General Purpose Compute<br />
| 48<br />
| 48 cores, 2x Intel(R) Xeon(R) Gold 6240R CPU @ 2.40GHz (Cascade Lake, 2021)<br />
| 185 GB<br />
| N/A <br />
| 100 Gbit/s Mellanox Infiniband<br />
|-<br />
| cpu2019<br />
| General Purpose Compute<br />
| 14<br />
| 40 cores, 2x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (Skylake, 2019)<br />
| 190 GB<br />
| N/A <br />
| 100 Gbit/s Omni-Path<br />
|-<br />
| apophis<br />
| General Purpose Compute<br />
| 21<br />
| 40 cores, 2x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (Skylake, 2019)<br />
| 190 GB<br />
| N/A <br />
| 100 Gbit/s Omni-Path<br />
|-<br />
| razi<br />
| General Purpose Compute<br />
| 41<br />
| 40 cores, 2x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (Skylake, 2019)<br />
| 190 GB<br />
| N/A <br />
| 100 Gbit/s Omni-Path<br />
|-<br />
| bigmem<br />
| Big Memory Nodes<br />
| 2<br />
| 80 cores, 4x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (Skylake, 2019)<br />
| 3022 GB<br />
| N/A <br />
| 100 Gbit/s Omni-Path<br />
|-<br />
| pawson<br />
| General Purpose Compute<br />
| 13<br />
| 40 cores, 2x Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz (Skylake, 2019)<br />
| 190 GB<br />
| N/A<br />
| 100 Gbit/s Omni-Path<br />
|-<br />
|cpu2017<br />
|General Purpose Compute<br />
|14<br />
|56 cores, 2x Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz (Sandy Bridge, 2012)<br />
|256 GB<br />
|N/A<br />
|40 Gbit/s InfiniBand<br />
|-<br />
| theia<br />
| Former Theia cluster<br />
| 20<br />
| 56 cores, 2x Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz (Sandy Bridge, 2012)<br />
| 188 GB<br />
| N/A <br />
| 40 Gbit/s InfiniBand<br />
|-<br />
| cpu2013<br />
| Former hyperion cluster<br />
| 12<br />
| 32 cores, 2x Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz (Sandy Bridge, 2012)<br />
| 126 GB<br />
| N/A<br />
| 40 Gbit/s InfiniBand<br />
|-<br />
| lattice<br />
| Former Lattice cluster<br />
| 307<br />
| 8 cores, 2x Intel(R) Xeon(R) CPU L5520 @ 2.27GHz (Nehalem, 2009)<br />
| 12 GB<br />
| N/A<br />
| 40 Gbit/s InfiniBand<br />
|-<br />
| single<br />
| Former Lattice cluster<br />
| 168<br />
| 8 cores, 2x Intel(R) Xeon(R) CPU L5520 @ 2.27GHz (Nehalem, 2009)<br />
| 12 GB<br />
| N/A<br />
| 40 Gbit/s InfiniBand<br />
|-<br />
| parallel<br />
| Former Parallel Cluster<br />
| 576<br />
| 12 cores, 2x Intel(R) Xeon(R) CPU E5649 @ 2.53GHz (Westmere, 2011)<br />
| 24 GB<br />
| N/A<br />
| 40 Gbit/s InfiniBand<br />
|}<br />
<br />
===ARC Cluster Storage===<br />
Usage of ARC cluster storage is outlined by our [[ARC Storage Terms of Use]] page.<br />
<br />
{{Message Box<br />
| title=No Backup Policy!<br />
| message=You are responsible for your own backups. Many researchers will have accounts with Compute Canada and may choose to back up their data there (the Project file system accessible through the Cedar cluster would often be used). <br />
<br />
Please contact us at support@hpc.ucalgary.ca if you want more information about this option.<br />
<br />
You can also back up data to your UofC OneDrive for business allocation see: https://rcs.ucalgary.ca/How_to_transfer_data#rclone:_rsync_for_cloud_storage This allocation starts at 5TB. Contact the support center for questions regarding OneDrive for Business.<br />
}}<br />
<br />
The ARC cluster has around 2 petabyte of shared disk storage available across the entire cluster as well as temporary storage local to each of the compute nodes. Please refer to the individual sections below on the capacity limitations and usage policies. <br />
<br />
Use the <code>arc.quota</code> command on ARC to determine the available space on your various volumes and home directory.<br />
<br />
{| class="wikitable"<br />
!Partition<br />
!Description<br />
!Capacity<br />
|-<br />
|<code>/home</code><br />
|User home directories<br />
|500 GB (per user)<br />
|-<br />
|<code>/work</code><br />
|Research project storage<br />
|Up to 100's of TB<br />
|-<br />
|<code>/scratch</code><br />
|Scratch space for temporary files<br />
|Up to 15 TB<br />
|-<br />
|<code>/tmp</code><br />
|Temporary space local to the compute cluster<br />
|Dependent on available storage on nodes. Verify with <code>df -h</code>.<br />
|-<br />
|<code>/dev/shm</code><br />
|Small temporary in-memory disk space local to the compute cluster<br />
|Dependent on memory size set in your Slurm job.<br />
|}<br />
====<code>/home</code>: Home file system====<br />
Each user has a directory under /home and is the default working directory when logging in to ARC. Each home directory has a per-user quota of 500 GB. This limit is fixed and cannot be increased. Researchers requiring additional storage exceeding what is available on their home directory may use <code>/work</code> and <code>/scratch</code>.<br />
<br />
Note on file sharing: Due to security concerns, permissions set using <code>chmod</code> on your home directory to allow other users to read/write to your home directory be automatically reverted by an automated system process unless an explicit exception is made. If you need to share files with other researchers on the ARC cluster, please write to support@hpc.ucalgary.ca to ask for such an exception.<br />
<br />
====<code>/scratch</code>: Scratch file system for large job-oriented storage====<br />
Associated with each job, under the <code>/scratch</code> directory, a subdirectory is created that can be referenced in job scripts as <code>/scratch/${SLURM_JOB_ID}</code>. You can use that directory for temporary files needed during the course of a job. Up to 15 TB of storage may be used, per user (total for all your jobs) in the <code>/scratch</code> file system. <br />
<br />
Data in <code>/scratch</code> associated with a given job will be deleted automatically, without exception, five days after the job finishes.<br />
<br />
====<code>/work</code>: Work file system for larger projects====<br />
If you need more space than provided in <code>/home</code> and the <code>/scratch</code> job-oriented space is not appropriate for you case, please write to support@hpc.ucalgary.ca with an explanation, including an indication of how much storage you expect to need and for how long. If approved, you will then be assigned a directory under <code>/work</code> with an appropriately large quota.<br />
<br />
====<code>/tmp</code>,<code>/var/tmp</code>: Temporary files====<br />
You may use <code>/tmp</code> or <code>/var/tmp</code> for storing temporary files generated by your job. The <code>/tmp</code> is stored on a disk local to the compute node and is not shared across the cluster. The files stored here will be removed immediately after your job terminates.<br />
<br />
==== <code>/dev/shm</code>, <code>/run/user/$uid</code>: In-memory temporary files ====<br />
<code>/dev/shm</code> and <code>/run/user/$UID</code> is writable location for temporary files backed by virtual memory. This can be used if faster I/O is required. This is ideal for workloads that require many small read/writes to share data between processes or as a fast cache. The amount of data you can write here is dependent on the amount of free memory available to your job. The files stored at these locations will be removed immediately after your job terminates.<br />
<br />
== Software ==<br />
All ARC nodes run the latest version of Rocky Linux 8 with the same set of base software packages. To maintain the stability and consistency of all nodes, any additional dependencies that your software requires must be installed under your account. For your convenience, we have packaged commonly used software packages and dependencies as modules available under <code>/global/software</code>. If your software package is not available as a module, you may also try Anaconda which allows users to manage and install custom packages in an isolated environment.<br />
<br />
For a list of available packages that have been made available, please see [[ARC Software pages]]. <br />
<br />
Please contact us at support@hpc.ucalgary.ca if you need additional software installed.<br />
<br />
==== Modules ====<br />
The setup of the environment for using some of the installed software is through the <code>module</code> command. An overview of [https://www.westgrid.ca//support/modules modules on WestGrid (external link)] is largely applicable to ARC.<br />
<br />
Software packages bundled as a module will be available under <code>/global/software</code> and can be listed with the <code>module avail</code> command.<br />
<syntaxhighlight lang="bash"><br />
$ module avail<br />
</syntaxhighlight><br />
<br />
To enable Python, load the Python module by running:<br />
<syntaxhighlight lang="bash"><br />
$ module load python/anaconda-3.6-5.1.0<br />
</syntaxhighlight><br />
<br />
To unload the Python module, run:<br />
<syntaxhighlight lang="bash"><br />
$ module remove python/anaconda-3.6-5.1.0<br />
</syntaxhighlight><br />
<br />
To see currently loaded modules, run:<br />
<syntaxhighlight lang="bash"><br />
$ module list<br />
</syntaxhighlight><br />
<br />
By default, no modules are loaded on ARC. If you wish to use a specific module, such as the Intel compilers or the Open MPI parallel programming packages, you must load the appropriate module.<br />
<br />
== Job submission ==<br />
<br />
=== Interactive Jobs ===<br />
The ARC login node may be used for such tasks as editing files, compiling programs and running short tests while developing programs. We suggest CPU intensive workloads on the login node be restricted to under 15 minutes as per [[General Cluster Guidelines and Policies|our cluster guidelines]]. For interactive workloads exceeding 15 minutes, use the '''[[Running_jobs#Interactive_jobs|salloc command]]''' to allocate an interactive session on a compute node.<br />
<br />
The default salloc allocation is 1 CPU and 1 GB of memory. Adjust this by specifying <code>-n CPU#</code> and <code>--mem Megabytes</code>. You may request up to 5 hours of CPU time for interactive jobs.<br />
salloc --time=5:00:00 --partition=cpu2019<br />
<br />
Always use salloc or srun to start an interactive job. Do not SSH directly to a compute node as SSH sessions will be refused without an active job running.<br />
<br />
<!-- This information doesn't seem that useful or relevant to running interactive jobs. Move to getting started section?<br />
ARC uses the Linux operating system. The program that responds to your typed commands and allows you to run other programs is called the Linux shell. There are several different shells available, but, by default you will use one called bash. It is useful to have some knowledge of the shell and a variety of other command-line programs that you can use to manipulate files. If you are new to Linux systems, we recommend that you work through one of the many online tutorials that are available, such as the [http://www.ee.surrey.ac.uk/Teaching/Unix/index.html UNIX Tutorial for Beginners (external link)] provided by the University of Surrey. The tutorial covers such fundamental topics, among others, as creating, renaming and deleting files and directories, how to produce a listing of your files and how to tell how much disk space you are using. For a more comprehensive introduction to Linux, see [http://linuxcommand.sourceforge.net/tlcl.php The Linux Command Line (external link)].<br />
--><br />
<br />
=== Running non-interactive jobs (batch processing) ===<br />
Production runs and longer test runs should be submitted as (non-interactive) batch jobs, in which commands to be executed are listed in a script (text file). Batch jobs scripts are submitted using the <code>sbatch</code> command, part of the Slurm job management and scheduling software. #SBATCH directive lines at the beginning of the script are used to specify the resources needed for the job (cores, memory, run time limit and any specialized hardware needed).<br />
<br />
Most of the information on the [https://docs.computecanada.ca/wiki/Running_jobs Running Jobs (external link)] page on the Compute Canada web site is also relevant for submitting and managing batch jobs and reserving processors for interactive work on ARC. One major difference between running jobs on the ARC and Compute Canada clusters is in selecting the type of hardware that should be used for a job. On ARC, you choose the hardware to use primarily by specifying a partition, as described below.<br />
<br />
=== Selecting a Partition ===<br />
There are some aspects to consider when selecting a partition including:<br />
* Resource requirements in terms of memory and CPU cores<br />
* Hardware specific requirements, such as GPU or CPU Instruction Set Extensions<br />
* Partition resource limits and potential wait time<br />
* Software support parallel processing using Message Passing Interface (MPI), OpenMP, etc.<br />
** Eg. MPI for parallel processing can distribute memory across multiple nodes, per-node memory requirements could be lower. Whereas, OpenMP or single process code that is restricted to one node would require a higher memory node.<br />
** Note: MPI code running on hardware with Omni-Path networking should be compiled with Omni-Path networking support. This is provided by loading the <code>openmpi/2.1.3-opa</code> or <code>openmpi/3.1.2-opa</code> modules prior to compiling.<br />
<br />
Since resources that are requested are reserved for your job, please request only as much CPU and memory as your job requires to avoid reducing the cluster efficiency. If you are unsure which partition to use or the specific resource requests that are appropriate for your jobs, please contact us at [mailto:support@hpc.ucalgary.ca support@hpc.ucalgary.ca] and we would be happy to work with you.<br />
<br />
{| class="wikitable" style="width: 100%;"<br />
!Partition<br />
!Description<br />
!Cores/node<br />
!Memory Request Limit<br />
!Time Limit<br />
!GPU<br />
!Networking<br />
|-<br />
|cpu2021<br />
|General Purpose Compute<br />
|48<br />
|185,000 MB<br />
|7 days ‡<br />
|<br />
|100 Gbit/s Omni-Path<br />
|-<br />
|cpu2019<br />
|General Purpose Compute<br />
|40<br />
|185,000 MB<br />
|7 days ‡<br />
|<br />
|100 Gbit/s Omni-Path<br />
|-<br />
|bigmem<br />
|Big Memory Compute<br />
|80<br />
|3,000,000 MB<br />
|24 hours ‡<br />
|<br />
|100 Gbit/s Omni-Path<br />
|-<br />
|gpu-v100<br />
|GPU Compute<br />
|80<br />
|753,000 MB<br />
|24 hours ‡<br />
|2<br />
|100 Gbit/s Omni-Path<br />
|-<br />
|apophis&dagger;<br />
|Private Research Partition<br />
|40<br />
|185,000 MB<br />
|7 days ‡<br />
|<br />
|100 Gbit/s Omni-Path<br />
|-<br />
|razi&dagger;<br />
|Private Research Partition<br />
|40<br />
|185,000 MB<br />
|7 days ‡<br />
|<br />
|100 Gbit/s Omni-Path<br />
|-<br />
|pawson&dagger;<br />
|Private Research Partition<br />
|40<br />
|185,000 MB<br />
|7 days ‡<br />
|<br />
|100 Gbit/s Omni-Path<br />
|-<br />
|sherlock&dagger;<br />
|Private Research Partition<br />
|7<br />
|185,000 MB<br />
|7 days ‡<br />
|<br />
|100 Gbit/s Omni-Path<br />
|-<br />
|theia&dagger;<br />
|Private Research Partition<br />
|28<br />
|188,000 MB<br />
|7 days ‡<br />
|<br />
|40 Gbit/s InfiniBand<br />
|-<br />
|synergy&dagger;<br />
|Private Research Partition<br />
|14<br />
|245,000 MB<br />
|7 days ‡<br />
|<br />
|40 Gbit/s InfiniBand<br />
|-<br />
|cpu2013<br />
|Legacy General Purpose Compute<br />
|16<br />
|120000<br />
|7 days ‡<br />
|<br />
|40 Gbit/s InfiniBand<br />
|-<br />
|lattice<br />
|Legacy General Purpose Compute<br />
|8<br />
|12000<br />
|7 days ‡<br />
|<br />
|40 Gbit/s InfiniBand<br />
|-<br />
|parallel<br />
|Legacy General Purpose Compute<br />
|12<br />
|23000<br />
|7 days ‡<br />
|<br />
|40 Gbit/s InfiniBand<br />
|-<br />
|single<br />
|Legacy Single-Node Job Compute<br />
|8<br />
|12000<br />
|7 days ‡<br />
|<br />
|40 Gbit/s InfiniBand<br />
|-<br />
|cpu2021-bf24<br />
|Back-fill Compute (2021-era hardware, 24h)<br />
|48<br />
|185,000 MB<br />
|24 hours ‡<br />
|<br />
|100 Gbit/s Omni-Path<br />
|-<br />
|cpu2019-bf05<br />
|Back-fill Compute (2019-era hardware, 5h)<br />
|40<br />
|185,000 MB<br />
|5 hours ‡<br />
|<br />
|100 Gbit/s Omni-Path<br />
|-<br />
|cpu2017-bf05<br />
|Back-fill Compute (2017-era hardware, 5h)<br />
|14<br />
|245,000 MB<br />
|5 hours ‡<br />
|<br />
|40 Gbit/s InfiniBand<br />
|-<br />
|+ style="caption-side: bottom; text-align: left; font-weight: normal;" | &dagger; These partitions contain hardware contributed to ARC by particular researchers and should only be used by members of their research groups. However, they have generously allowed their compute nodes to be shared with others outside their research groups for short jobs. A special 'back-fill' or -bf partition is available for use by all ARC users for jobs shorter than 5 hours.<br />‡ As time limits may be changed by administrators to adjust to maintenance schedules or system load, the values given in the tables are not definitive. See the Time limits section below for commands you can use on ARC itself to determine current limits.<br />
|}<br />
<br />
==== Backfill partitions ====<br />
Backfill partitions can be used by all users on ARC for short-term jobs. The hardware backing these partitions are generously contributed by researchers. We recommend including the backfill partitions for short term jobs as it may help reduce your job's wait time and increase the overall cluster throughput.<br />
<br />
Previously, each contributing research group had their own backfill partition. Since June 2021, we have merged:<br />
<br />
* apophis-bf, pawson-bf, and razi-bf into cpu2019-bf05 <br />
* theia-bf and synergy-bf into cpu2017-bf05<br />
<br />
The naming scheme of the backfill partitions is the CPU generation year, followed by -bf and the time limit in hours. For example, cpu2017-bf05 would represent a backfill partition containing processors from 2017 with a time limit of 5 hours.<br />
<br />
==== Hardware resource and job policy limits ====<br />
In addition to the hardware limitations, please be aware that there may also be policy limits imposed on your account for each partition. These limits restrict the number of cores, nodes, or GPUs that can be used at any given time. Since the limits are applied on a partition-by-partition basis, using resources in one partition should not affect the available resources you can use in another partition.<br />
<br />
These limits can be listed by running:<br />
<syntaxhighlight lang="bash"><br />
$ sacctmgr show qos format=Name,MaxWall,MaxTRESPU%20,MaxSubmitJobs<br />
Name MaxWall MaxTRESPU MaxSubmit<br />
---------- ----------- -------------------- ---------<br />
normal 7-00:00:00 2000<br />
breezy 3-00:00:00 cpu=384 2000<br />
gpu 7-00:00:00 13000<br />
cpu2019 7-00:00:00 cpu=240 2000<br />
gpu-v100 1-00:00:00 cpu=80,gres/gpu=4 2000<br />
single 7-00:00:00 cpu=408,node=75 2000<br />
razi 7-00:00:00 2000<br />
</syntaxhighlight><br />
<br />
==== Specifying a partition in a job ====<br />
One you have decided which partitions best suits your computation, you can select one or more partition on a job-by-job basis by including the <code>partition</code> keyword for an <code>SBATCH</code> directive in your batch job. Multiple partitions should be comma separated. If you omit the partition specification, the system will try to assign your job to appropriate hardware based on other aspects of your request. <br />
<br />
In some cases, you really should specify the partition explicitly. For example, if you are running single-node jobs with thread-based parallel processing requesting 8 cores you could use:<br />
<syntaxhighlight lang="bash"><br />
#SBATCH --mem=0 ❶<br />
#SBATCH --nodes=1 ❷<br />
#SBATCH --ntasks=1 ❸<br />
#SBATCH --cpus-per-task=8 ❹<br />
#SBATCH --partition=single,lattice ❺ <br />
</syntaxhighlight><br />
<br />
A few things to mention in this example:<br />
# <code>--mem=0</code> allocates all available memory on the compute node for the job. This effectively allocates the entire node for your job.<br />
# <code>--nodes=1</code> allocates 1 node for the job<br />
# <code>--ntasks=1</code> your job has a single task<br />
# <code>--cpus-per-task=8</code> asks for 8 CPUs per task. This job in total will request 8 * 1, or 8 CPUs.<br />
# <code>--partition=single,lattice</code> specifies that this job can run on either single or lattice.<br />
Suppose that your job requires at most 8 CPU cores and 10 GB of memory. The above Slurm request would be valid and optimal since your job fits neatly in a single node on the single and parallel partition. However, if you failed to specify the partition, Slurm may try to schedule your job to a partition with larger nodes, such as cpu2019 where each node has 40 cores and 190 GB of memory. If your job is scheduled on such a node, your job will be effectively wasting 32 cores and 180 GB of memory because <code>--mem=0</code> not only requests for 190 GB on this node, but also prevents other jobs from being scheduled on the same node.<br />
<br />
If you don't specify a partition, please give greater thought to the memory specification to make sure that the scheduler will not assign your job more resources than are needed.<br />
<br />
Parameters such as '''--ntasks-per-cpu''', '''--cpus-per-task''', '''--mem''' and '''--mem-per-cpu>''' have to be adjusted according to the capabilities of the hardware also. The product of --ntasks-per-cpu and --cpus-per-task should be less than or equal to the number given in the "Cores/node" column. The '''--mem>''' parameter (or the product of '''--mem-per-cpu''' and '''--cpus-per-task''') should be less than the "Memory limit" shown. If using whole nodes, you can specify '''--mem=0''' to request the maximum amount of memory per node.<br />
<br />
===== Examples =====<br />
Here are some examples of specifying the various partitions.<br />
<br />
As mentioned in the [[#Hardware|Hardware]] section above, the ARC cluster was expanded in January 2019. To select the 40-core general purpose nodes specify:<br />
<br />
#SBATCH --partition=cpu2019<br />
<br />
To run on the Tesla V100 GPU-enabled nodes, use the '''gpu-v100''' partition. You will also need to include an SBATCH directive in the form '''--gres=gpu:n''' to specify the number of GPUs, n, that you need. For example, if the software you are running can make use of both GPUs on a gpu-v100 partition compute node, use:<br />
<br />
#SBATCH --partition=gpu-v100 --gres=gpu:2<br />
<br />
For very large memory jobs (more than 185000 MB), specify the bigmem partition:<br />
<br />
#SBATCH --partition=bigmem<br />
<br />
If the more modern computers are too busy or you have a job well-suited to run on the compute nodes described in the legacy hardware section above, choose the cpu2013, Lattice or Parallel compute nodes by specifying the corresponding partition keyword:<br />
<br />
#SBATCH --partition=cpu2013<br />
#SBATCH --partition=lattice<br />
#SBATCH --partition=parallel<br />
<br />
There is an additional partition called '''single''' that provides nodes similar to the lattice partition, but, is intended for single-node jobs. Select the single partition with<br />
<br />
#SBATCH --partition=single<br />
<br />
=== Time limits ===<br />
Use the <code>--time</code> directive to tell the job scheduler the maximum time that your job might run. For example:<br />
#SBATCH --time=hh:mm:ss<br />
<br />
You can use <code>scontrol show partitions</code> or <code>sinfo</code> to see the current maximum time that a job can run.<br />
<syntaxhighlight lang="bash" highlight="6"><br />
$ scontrol show partitions<br />
PartitionName=single <br />
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL <br />
AllocNodes=ALL Default=NO QoS=single <br />
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO <br />
MaxNodes=UNLIMITED MaxTime=7-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED <br />
Nodes=cn[001-168] <br />
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO <br />
OverTimeLimit=NONE PreemptMode=OFF <br />
State=UP TotalCPUs=1344 TotalNodes=168 SelectTypeParameters=NONE <br />
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED <br />
</syntaxhighlight><br />
<br />
Alternatively, with <code>sinfo</code> under the <code>TIMELIMIT</code> column:<br />
<syntaxhighlight lang="bash"><br />
$ sinfo <br />
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST <br />
single up 7-00:00:00 1 drain* cn097 <br />
single up 7-00:00:00 1 maint cn002 <br />
single up 7-00:00:00 4 drain* cn[001,061,133,154] <br />
...<br />
</syntaxhighlight><br />
<br />
== Support ==<br />
{{Message Box<br />
|title=[[Support|Need Help or have other ARC Related Questions?]]<br />
|message=For all general RCS related issues, questions, or comments, please contact us at support@hpc.ucalgary.ca.<br />
|icon=Support Icon.png}}<br />
<br />
Please don't hesitate to [[Support|contact us]] directly by email if you need help using ARC or require guidance on migrating and running your workflows to ARC.<br />
<br />
[[Category:ARC]]<br />
[[Category:Guides]]</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Guide&diff=2438ARC Cluster Guide2023-04-26T13:42:42Z<p>Tthomas: added link to data classification standard</p>
<hr />
<div>{{ARC Cluster Status}}<br />
<br />
{{Message Box<br />
|title=[[Support|Need Help or have other ARC Related Questions?]]<br />
|message=For all general RCS related issues, questions, or comments, please contact us at support@hpc.ucalgary.ca.<br />
|icon=Support Icon.png}}<br />
<br />
This guide gives an overview of the Advanced Research Computing (ARC) cluster at the University of Calgary and is intended to be read by new account holders getting started on ARC. This guide covers topics such as the hardware and performance characteristics, available software, usage policies and how to log in and run jobs. ARC can be used with data that a Researcher has classified as Lv1 and Lv2 as described in the [https://www.ucalgary.ca/legal-services/sites/default/files/teams/1/Standards-Legal-Information-Security-Classification-Standard.pdf Information Security Classification Standard] <br />
<br />
== Introduction ==<br />
The ARC is a high performance compute (HPC) cluster that is available for research projects based at the University of Calgary. This compute cluster is comprised of hundreds of severs interconnected with a high bandwidth interconnect. Special resources within the cluster include nodes with large memory installed and GPUs are also available. You may learn more about ARC's hardware in the [[ARC Cluster Guide#Hardware|hardware section below]]. ARC can be accessed through a [[Linux Introduction|command line interface]] or via a web interface called Open OnDemand.<br />
<br />
This cluster can be used for running large numbers (hundreds) of concurrent serial (one core) jobs, OpenMP or other thread-based jobs, shared-memory parallel code using up to 40 or 80 threads per job (depending on the partition), distributed-memory (MPI-based) parallel code using up to hundreds of cores, or jobs that take advantage of Graphics Processing Units (GPUs).<br />
<br />
Historically, ARC is primarily comprised of older, disparate Linux-based clusters that were formerly offered to researchers from across Canada such as Breezy, Lattice, and Parallel. In addition, a large-memory compute node (Bigbyte) was salvaged from the now-retired local Storm cluster. In January 2019, a major addition to ARC with modern hardware was purchased. In 2020, compute clusters from CHGI have been migrated into ARC.<br />
<br />
=== How to Get Started ===<br />
If you have a project you think would be appropriate for ARC, please email support@hpc.ucalgary.ca and mention the intended research and software you plan to use. You must have a University of Calgary IT account in order to use ARC.<br />
* For users that do not have a University of IT account or email address, please register for one at https://itregport.ucalgary.ca/.<br />
* For users external to the University, such as for users collaborating on a research project at the University of Calgary, please contact us and mention the project leader you are collaborating with.<br />
<br />
Once your access to ARC has been granted, you will be able to immediately make use of the cluster using your University of Calgary IT account by following the [[ARC_Cluster_Guide#Using_ARC|usage guide outlined below]].<br />
<br />
== Using ARC ==<br />
<br />
{{Message Box<br />
|icon=Security Icon.png<br />
|title=Cybersecurity awareness at the U of C<br />
|message=Please note that there are typically about 950 phishing attempts targeting University of Calgary accounts each month. This is just a reminder to be careful about computer security issues, both at home and at the University. Please visit https://it.ucalgary.ca/it-security for more information, tips on secure computing, and how to report suspected security problems.}}<br />
<br />
=== Logging in ===<br />
To log in to ARC, connect using SSH to <code>arc.ucalgary.ca</code> on port <code>22</code>. Connections to ARC are accepted only from the University of Calgary network (on campus) or through the University of Calgary General VPN (off campus).<br />
<br />
See [[Connecting to RCS HPC Systems]] for more information.<br />
=== How to interact with ARC ===<br />
<br />
ARC cluster is a collection of several compute nodes connected by a high-speed network. On ARC, computations get submitted as jobs. Once submitted, the jobs are then assigned to compute nodes by the job scheduler as resources become available.<br />
<br />
[[File:Cluster.png]]<br />
<br />
You can access ARC with your UCalgary IT user credentials. Once connected, you will get placed in the ARC login node, for basic tasks such as job submission, monitor job status, manage files, edit text, etc. It is a shared resource where multiple users get connected at the same time. Thus, any intensive tasks is not allowed on the login node as it may block other potential users to connect/submit their computations. <br />
[tannistha.nandi@arc ~]$ <br />
The job scheduling system on ARC is called SLURM. On ARC, there are two SLURM commands that can allocate resources to a job under appropriate conditions: ‘salloc’ and ‘sbatch’. They both accept the same set of command line options with respect to resource allocation. <br />
<br />
'''‘salloc’''' is to launch an interactive session, typically for tasks under 5 hours. <br />
Once an interactive job session is created, you can do things like explore research datasets, start R or python sessions to test your code, compile software applications etc.<br />
<br />
a. Example 1: The following command requests for 1 cpu on 1 node for 1 task along with 1 GB of RAM for an hour. <br />
[tannistha.nandi@arc ~]$ salloc --mem=1G -c 1 -N 1 -n 1 -t 01:00:00<br />
salloc: Granted job allocation 6758015<br />
salloc: Waiting for resource configuration<br />
salloc: Nodes fc4 are ready for job<br />
[tannistha.nandi@fc4 ~]$ <br />
<br />
<br />
b. Example 2: The following command requests for 1 GPU to be used from 1 node belonging to the gpu-v100 partition along with 1 GB of RAM for 1 hour. Generic resource scheduling (--gres) is used to request for GPU resources.<br />
[tannistha.nandi@arc ~]$ salloc --mem=1G -t 01:00:00 -p gpu-v100 --gres=gpu:1<br />
salloc: Granted job allocation 6760460<br />
salloc: Waiting for resource configuration<br />
salloc: Nodes fg3 are ready for job<br />
[tannistha.nandi@fg3 ~]$<br />
<br />
Once you finish the work, type 'exit' at the command prompt to end the interactive session,<br />
[tannistha.nandi@fg3 ~]$ exit<br />
[tannistha.nandi@fg3 ~]$ salloc: Relinquishing job allocation 6760460<br />
It is to ensure that the allocated resources are released from your job and now available to other users.<br />
<br />
'''‘sbatch’''' is to submit computations as jobs to run on the cluster. You can submit a job-script.slurm via 'sbatch' for execution. <br />
[tannistha.nandi@arc ~]$ sbatch job-script.slurm<br />
When resources become available, they get allocated to this task. Batch jobs are suited for tasks that run for long periods of time without any user supervision. When the job-script terminates, the allocation is released. <br />
Please review the section on how to prepare job scripts for more information.<br />
<br />
=== Prepare job scripts ===<br />
Job scripts are text files saved with an extension '.slurm', for example, 'job-script.slurm'. <br />
A job script looks something like this:<br />
''#!/bin/bash''<br />
####### Reserve computing resources #############<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks=1<br />
#SBATCH --cpus-per-task=1<br />
#SBATCH --time=01:00:00<br />
#SBATCH --mem=1G<br />
#SBATCH --partition=cpu2019<br><br />
####### Set environment variables ###############<br />
module load python/anaconda3-2018.12<br><br />
####### Run your script #########################<br />
python myscript.py<br />
<br />
The first line contains the text "#!/bin/bash" to interpret it as a bash script.<br />
<br />
It is followed by lines that start with a '#SBATCH' to communicate with 'SLURM'. You may add as many #SBATCH directives as needed to reserve computing resources for your task. The above example requests for one cpu on a single node for 1 task along with 1GB RAM for an hour on cpu2019 partition.<br />
<br />
Next, you have to set up environment variables either by loading the modules centrally installed on ARC or export path to the software in your home directory. The above example loads an available python module.<br />
<br />
Finally, include the Linux command to execute the local script.<br />
<br />
Note that failing to specify part of a resource allocation request (most notably '''time''' and '''memory''') will result in bad resource requests as the defaults are not appropriate to most cases. Please refer to the section 'Running non-interactive jobs' for more examples.<br />
<br />
== Hardware ==<br />
Since the ARC cluster is a conglomeration of many different compute clusters, the hardware within ARC can vary widely in terms of performance and capabilities. To mitigate any compatibility issues with different hardware, we combine similar hardware into their own Slurm partition to ensure your workload runs as consistently as possible within one partition. Please carefully review the hardware specs for each of the partitions below to avoid any surprises.<br />
<br />
=== Partition Hardware Specs ===<br />
When submitting jobs to ARC, you may specify a partition that your job will run on. Please choose a partition that is most appropriate for your work.<br />
<br />
* See also [[How to find available partitions on ARC]].<br />
<br />
A few things to keep in mind when choosing a partition:<br />
* Specific workloads requiring special Intel Instruction Set Extensions may only work on newer Intel CPUs. <br />
* If working with multi-node parallel processing, ensure your software and libraries support the partition's interconnect networking.<br />
* While older partitions may be slower, they may be less busy and have little to no wait times.<br />
<br />
If you are unsure which partition to use or need assistance on selecting an appropriate partition, please see [[#Selecting_a_Partition|the Selecting a Partition Section]] below. <br />
<br />
{| class="wikitable"<br />
! Partition<br />
! Description<br />
! Nodes<br />
! CPU Cores, Model, and Year<br />
! Memory<br />
! GPU<br />
! Network<br />
|-<br />
| -<br />
| ARC Login Node<br />
| 1<br />
| 16 cores, 2x Intel(R) Xeon(R) CPU E5620 @ 2.40GHz (Westmere, 2010)<br />
| 48 GB<br />
| N/A<br />
| 40 Gbit/s InfiniBand<br />
|-<br />
| gpu-v100<br />
| GPU Parition<br />
| 13<br />
| 80 cores, 2x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (Skylake, 2019)<br />
| 754 GB<br />
| 2x Tesla V100-PCIE-16GB<br />
| 100 Gbit/s Omni-Path<br />
|-<br />
|gpu-a100<br />
|GPU Partition<br />
|5<br />
|40 cores, 1x Intel(R) Xeon(R) Silver 4316 CPU @ 2.30GHz (Ice Lake, 2021)<br />
|512 GB<br />
|2x GA100 A100 PCIe 80GB<br />
|100 Gbit/s Mellanox Infiniband<br />
|-<br />
| cpu2021<br />
| General Purpose Compute<br />
| 48<br />
| 48 cores, 2x Intel(R) Xeon(R) Gold 6240R CPU @ 2.40GHz (Cascade Lake, 2021)<br />
| 185 GB<br />
| N/A <br />
| 100 Gbit/s Mellanox Infiniband<br />
|-<br />
| cpu2019<br />
| General Purpose Compute<br />
| 14<br />
| 40 cores, 2x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (Skylake, 2019)<br />
| 190 GB<br />
| N/A <br />
| 100 Gbit/s Omni-Path<br />
|-<br />
| apophis<br />
| General Purpose Compute<br />
| 21<br />
| 40 cores, 2x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (Skylake, 2019)<br />
| 190 GB<br />
| N/A <br />
| 100 Gbit/s Omni-Path<br />
|-<br />
| razi<br />
| General Purpose Compute<br />
| 41<br />
| 40 cores, 2x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (Skylake, 2019)<br />
| 190 GB<br />
| N/A <br />
| 100 Gbit/s Omni-Path<br />
|-<br />
| bigmem<br />
| Big Memory Nodes<br />
| 2<br />
| 80 cores, 4x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (Skylake, 2019)<br />
| 3022 GB<br />
| N/A <br />
| 100 Gbit/s Omni-Path<br />
|-<br />
| pawson<br />
| General Purpose Compute<br />
| 13<br />
| 40 cores, 2x Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz (Skylake, 2019)<br />
| 190 GB<br />
| N/A<br />
| 100 Gbit/s Omni-Path<br />
|-<br />
|cpu2017<br />
|General Purpose Compute<br />
|14<br />
|56 cores, 2x Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz (Sandy Bridge, 2012)<br />
|256 GB<br />
|N/A<br />
|40 Gbit/s InfiniBand<br />
|-<br />
| theia<br />
| Former Theia cluster<br />
| 20<br />
| 56 cores, 2x Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz (Sandy Bridge, 2012)<br />
| 188 GB<br />
| N/A <br />
| 40 Gbit/s InfiniBand<br />
|-<br />
| cpu2013<br />
| Former hyperion cluster<br />
| 12<br />
| 32 cores, 2x Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz (Sandy Bridge, 2012)<br />
| 126 GB<br />
| N/A<br />
| 40 Gbit/s InfiniBand<br />
|-<br />
| lattice<br />
| Former Lattice cluster<br />
| 307<br />
| 8 cores, 2x Intel(R) Xeon(R) CPU L5520 @ 2.27GHz (Nehalem, 2009)<br />
| 12 GB<br />
| N/A<br />
| 40 Gbit/s InfiniBand<br />
|-<br />
| single<br />
| Former Lattice cluster<br />
| 168<br />
| 8 cores, 2x Intel(R) Xeon(R) CPU L5520 @ 2.27GHz (Nehalem, 2009)<br />
| 12 GB<br />
| N/A<br />
| 40 Gbit/s InfiniBand<br />
|-<br />
| parallel<br />
| Former Parallel Cluster<br />
| 576<br />
| 12 cores, 2x Intel(R) Xeon(R) CPU E5649 @ 2.53GHz (Westmere, 2011)<br />
| 24 GB<br />
| N/A<br />
| 40 Gbit/s InfiniBand<br />
|}<br />
<br />
===ARC Cluster Storage===<br />
Usage of ARC cluster storage is outlined by our [[ARC Storage Terms of Use]] page.<br />
<br />
{{Message Box<br />
| title=No Backup Policy!<br />
| message=You are responsible for your own backups. Many researchers will have accounts with Compute Canada and may choose to back up their data there (the Project file system accessible through the Cedar cluster would often be used). <br />
<br />
Please contact us at support@hpc.ucalgary.ca if you want more information about this option.<br />
<br />
You can also back up data to your UofC OneDrive for business allocation see: https://rcs.ucalgary.ca/How_to_transfer_data#rclone:_rsync_for_cloud_storage This allocation starts at 5TB. Contact the support center for questions regarding OneDrive for Business.<br />
}}<br />
<br />
The ARC cluster has around 2 petabyte of shared disk storage available across the entire cluster as well as temporary storage local to each of the compute nodes. Please refer to the individual sections below on the capacity limitations and usage policies. <br />
<br />
Use the <code>arc.quota</code> command on ARC to determine the available space on your various volumes and home directory.<br />
<br />
{| class="wikitable"<br />
!Partition<br />
!Description<br />
!Capacity<br />
|-<br />
|<code>/home</code><br />
|User home directories<br />
|500 GB (per user)<br />
|-<br />
|<code>/work</code><br />
|Research project storage<br />
|Up to 100's of TB<br />
|-<br />
|<code>/scratch</code><br />
|Scratch space for temporary files<br />
|Up to 15 TB<br />
|-<br />
|<code>/tmp</code><br />
|Temporary space local to the compute cluster<br />
|Dependent on available storage on nodes. Verify with <code>df -h</code>.<br />
|-<br />
|<code>/dev/shm</code><br />
|Small temporary in-memory disk space local to the compute cluster<br />
|Dependent on memory size set in your Slurm job.<br />
|}<br />
====<code>/home</code>: Home file system====<br />
Each user has a directory under /home and is the default working directory when logging in to ARC. Each home directory has a per-user quota of 500 GB. This limit is fixed and cannot be increased. Researchers requiring additional storage exceeding what is available on their home directory may use <code>/work</code> and <code>/scratch</code>.<br />
<br />
Note on file sharing: Due to security concerns, permissions set using <code>chmod</code> on your home directory to allow other users to read/write to your home directory be automatically reverted by an automated system process unless an explicit exception is made. If you need to share files with other researchers on the ARC cluster, please write to support@hpc.ucalgary.ca to ask for such an exception.<br />
<br />
====<code>/scratch</code>: Scratch file system for large job-oriented storage====<br />
Associated with each job, under the <code>/scratch</code> directory, a subdirectory is created that can be referenced in job scripts as <code>/scratch/${SLURM_JOB_ID}</code>. You can use that directory for temporary files needed during the course of a job. Up to 15 TB of storage may be used, per user (total for all your jobs) in the <code>/scratch</code> file system. <br />
<br />
Data in <code>/scratch</code> associated with a given job will be deleted automatically, without exception, five days after the job finishes.<br />
<br />
====<code>/work</code>: Work file system for larger projects====<br />
If you need more space than provided in <code>/home</code> and the <code>/scratch</code> job-oriented space is not appropriate for you case, please write to support@hpc.ucalgary.ca with an explanation, including an indication of how much storage you expect to need and for how long. If approved, you will then be assigned a directory under <code>/work</code> with an appropriately large quota.<br />
<br />
====<code>/tmp</code>,<code>/var/tmp</code>: Temporary files====<br />
You may use <code>/tmp</code> or <code>/var/tmp</code> for storing temporary files generated by your job. The <code>/tmp</code> is stored on a disk local to the compute node and is not shared across the cluster. The files stored here will be removed immediately after your job terminates.<br />
<br />
==== <code>/dev/shm</code>, <code>/run/user/$uid</code>: In-memory temporary files ====<br />
<code>/dev/shm</code> and <code>/run/user/$UID</code> is writable location for temporary files backed by virtual memory. This can be used if faster I/O is required. This is ideal for workloads that require many small read/writes to share data between processes or as a fast cache. The amount of data you can write here is dependent on the amount of free memory available to your job. The files stored at these locations will be removed immediately after your job terminates.<br />
<br />
== Software ==<br />
All ARC nodes run the latest version of Rocky Linux 8 with the same set of base software packages. To maintain the stability and consistency of all nodes, any additional dependencies that your software requires must be installed under your account. For your convenience, we have packaged commonly used software packages and dependencies as modules available under <code>/global/software</code>. If your software package is not available as a module, you may also try Anaconda which allows users to manage and install custom packages in an isolated environment.<br />
<br />
For a list of available packages that have been made available, please see [[ARC Software pages]]. <br />
<br />
Please contact us at support@hpc.ucalgary.ca if you need additional software installed.<br />
<br />
==== Modules ====<br />
The setup of the environment for using some of the installed software is through the <code>module</code> command. An overview of [https://www.westgrid.ca//support/modules modules on WestGrid (external link)] is largely applicable to ARC.<br />
<br />
Software packages bundled as a module will be available under <code>/global/software</code> and can be listed with the <code>module avail</code> command.<br />
<syntaxhighlight lang="bash"><br />
$ module avail<br />
</syntaxhighlight><br />
<br />
To enable Python, load the Python module by running:<br />
<syntaxhighlight lang="bash"><br />
$ module load python/anaconda-3.6-5.1.0<br />
</syntaxhighlight><br />
<br />
To unload the Python module, run:<br />
<syntaxhighlight lang="bash"><br />
$ module remove python/anaconda-3.6-5.1.0<br />
</syntaxhighlight><br />
<br />
To see currently loaded modules, run:<br />
<syntaxhighlight lang="bash"><br />
$ module list<br />
</syntaxhighlight><br />
<br />
By default, no modules are loaded on ARC. If you wish to use a specific module, such as the Intel compilers or the Open MPI parallel programming packages, you must load the appropriate module.<br />
<br />
== Job submission ==<br />
<br />
=== Interactive Jobs ===<br />
The ARC login node may be used for such tasks as editing files, compiling programs and running short tests while developing programs. We suggest CPU intensive workloads on the login node be restricted to under 15 minutes as per [[General Cluster Guidelines and Policies|our cluster guidelines]]. For interactive workloads exceeding 15 minutes, use the '''[[Running_jobs#Interactive_jobs|salloc command]]''' to allocate an interactive session on a compute node.<br />
<br />
The default salloc allocation is 1 CPU and 1 GB of memory. Adjust this by specifying <code>-n CPU#</code> and <code>--mem Megabytes</code>. You may request up to 5 hours of CPU time for interactive jobs.<br />
salloc --time=5:00:00 --partition=cpu2019<br />
<br />
Always use salloc or srun to start an interactive job. Do not SSH directly to a compute node as SSH sessions will be refused without an active job running.<br />
<br />
<!-- This information doesn't seem that useful or relevant to running interactive jobs. Move to getting started section?<br />
ARC uses the Linux operating system. The program that responds to your typed commands and allows you to run other programs is called the Linux shell. There are several different shells available, but, by default you will use one called bash. It is useful to have some knowledge of the shell and a variety of other command-line programs that you can use to manipulate files. If you are new to Linux systems, we recommend that you work through one of the many online tutorials that are available, such as the [http://www.ee.surrey.ac.uk/Teaching/Unix/index.html UNIX Tutorial for Beginners (external link)] provided by the University of Surrey. The tutorial covers such fundamental topics, among others, as creating, renaming and deleting files and directories, how to produce a listing of your files and how to tell how much disk space you are using. For a more comprehensive introduction to Linux, see [http://linuxcommand.sourceforge.net/tlcl.php The Linux Command Line (external link)].<br />
--><br />
<br />
=== Running non-interactive jobs (batch processing) ===<br />
Production runs and longer test runs should be submitted as (non-interactive) batch jobs, in which commands to be executed are listed in a script (text file). Batch jobs scripts are submitted using the <code>sbatch</code> command, part of the Slurm job management and scheduling software. #SBATCH directive lines at the beginning of the script are used to specify the resources needed for the job (cores, memory, run time limit and any specialized hardware needed).<br />
<br />
Most of the information on the [https://docs.computecanada.ca/wiki/Running_jobs Running Jobs (external link)] page on the Compute Canada web site is also relevant for submitting and managing batch jobs and reserving processors for interactive work on ARC. One major difference between running jobs on the ARC and Compute Canada clusters is in selecting the type of hardware that should be used for a job. On ARC, you choose the hardware to use primarily by specifying a partition, as described below.<br />
<br />
=== Selecting a Partition ===<br />
There are some aspects to consider when selecting a partition including:<br />
* Resource requirements in terms of memory and CPU cores<br />
* Hardware specific requirements, such as GPU or CPU Instruction Set Extensions<br />
* Partition resource limits and potential wait time<br />
* Software support parallel processing using Message Passing Interface (MPI), OpenMP, etc.<br />
** Eg. MPI for parallel processing can distribute memory across multiple nodes, per-node memory requirements could be lower. Whereas, OpenMP or single process code that is restricted to one node would require a higher memory node.<br />
** Note: MPI code running on hardware with Omni-Path networking should be compiled with Omni-Path networking support. This is provided by loading the <code>openmpi/2.1.3-opa</code> or <code>openmpi/3.1.2-opa</code> modules prior to compiling.<br />
<br />
Since resources that are requested are reserved for your job, please request only as much CPU and memory as your job requires to avoid reducing the cluster efficiency. If you are unsure which partition to use or the specific resource requests that are appropriate for your jobs, please contact us at [mailto:support@hpc.ucalgary.ca support@hpc.ucalgary.ca] and we would be happy to work with you.<br />
<br />
{| class="wikitable" style="width: 100%;"<br />
!Partition<br />
!Description<br />
!Cores/node<br />
!Memory Request Limit<br />
!Time Limit<br />
!GPU<br />
!Networking<br />
|-<br />
|cpu2021<br />
|General Purpose Compute<br />
|48<br />
|185,000 MB<br />
|7 days ‡<br />
|<br />
|100 Gbit/s Omni-Path<br />
|-<br />
|cpu2019<br />
|General Purpose Compute<br />
|40<br />
|185,000 MB<br />
|7 days ‡<br />
|<br />
|100 Gbit/s Omni-Path<br />
|-<br />
|bigmem<br />
|Big Memory Compute<br />
|80<br />
|3,000,000 MB<br />
|24 hours ‡<br />
|<br />
|100 Gbit/s Omni-Path<br />
|-<br />
|gpu-v100<br />
|GPU Compute<br />
|80<br />
|753,000 MB<br />
|24 hours ‡<br />
|2<br />
|100 Gbit/s Omni-Path<br />
|-<br />
|apophis&dagger;<br />
|Private Research Partition<br />
|40<br />
|185,000 MB<br />
|7 days ‡<br />
|<br />
|100 Gbit/s Omni-Path<br />
|-<br />
|razi&dagger;<br />
|Private Research Partition<br />
|40<br />
|185,000 MB<br />
|7 days ‡<br />
|<br />
|100 Gbit/s Omni-Path<br />
|-<br />
|pawson&dagger;<br />
|Private Research Partition<br />
|40<br />
|185,000 MB<br />
|7 days ‡<br />
|<br />
|100 Gbit/s Omni-Path<br />
|-<br />
|sherlock&dagger;<br />
|Private Research Partition<br />
|7<br />
|185,000 MB<br />
|7 days ‡<br />
|<br />
|100 Gbit/s Omni-Path<br />
|-<br />
|theia&dagger;<br />
|Private Research Partition<br />
|28<br />
|188,000 MB<br />
|7 days ‡<br />
|<br />
|40 Gbit/s InfiniBand<br />
|-<br />
|synergy&dagger;<br />
|Private Research Partition<br />
|14<br />
|245,000 MB<br />
|7 days ‡<br />
|<br />
|40 Gbit/s InfiniBand<br />
|-<br />
|cpu2013<br />
|Legacy General Purpose Compute<br />
|16<br />
|120000<br />
|7 days ‡<br />
|<br />
|40 Gbit/s InfiniBand<br />
|-<br />
|lattice<br />
|Legacy General Purpose Compute<br />
|8<br />
|12000<br />
|7 days ‡<br />
|<br />
|40 Gbit/s InfiniBand<br />
|-<br />
|parallel<br />
|Legacy General Purpose Compute<br />
|12<br />
|23000<br />
|7 days ‡<br />
|<br />
|40 Gbit/s InfiniBand<br />
|-<br />
|single<br />
|Legacy Single-Node Job Compute<br />
|8<br />
|12000<br />
|7 days ‡<br />
|<br />
|40 Gbit/s InfiniBand<br />
|-<br />
|cpu2021-bf24<br />
|Back-fill Compute (2021-era hardware, 24h)<br />
|48<br />
|185,000 MB<br />
|24 hours ‡<br />
|<br />
|100 Gbit/s Omni-Path<br />
|-<br />
|cpu2019-bf05<br />
|Back-fill Compute (2019-era hardware, 5h)<br />
|40<br />
|185,000 MB<br />
|5 hours ‡<br />
|<br />
|100 Gbit/s Omni-Path<br />
|-<br />
|cpu2017-bf05<br />
|Back-fill Compute (2017-era hardware, 5h)<br />
|14<br />
|245,000 MB<br />
|5 hours ‡<br />
|<br />
|40 Gbit/s InfiniBand<br />
|-<br />
|+ style="caption-side: bottom; text-align: left; font-weight: normal;" | &dagger; These partitions contain hardware contributed to ARC by particular researchers and should only be used by members of their research groups. However, they have generously allowed their compute nodes to be shared with others outside their research groups for short jobs. A special 'back-fill' or -bf partition is available for use by all ARC users for jobs shorter than 5 hours.<br />‡ As time limits may be changed by administrators to adjust to maintenance schedules or system load, the values given in the tables are not definitive. See the Time limits section below for commands you can use on ARC itself to determine current limits.<br />
|}<br />
<br />
==== Backfill partitions ====<br />
Backfill partitions can be used by all users on ARC for short-term jobs. The hardware backing these partitions are generously contributed by researchers. We recommend including the backfill partitions for short term jobs as it may help reduce your job's wait time and increase the overall cluster throughput.<br />
<br />
Previously, each contributing research group had their own backfill partition. Since June 2021, we have merged:<br />
<br />
* apophis-bf, pawson-bf, and razi-bf into cpu2019-bf05 <br />
* theia-bf and synergy-bf into cpu2017-bf05<br />
<br />
The naming scheme of the backfill partitions is the CPU generation year, followed by -bf and the time limit in hours. For example, cpu2017-bf05 would represent a backfill partition containing processors from 2017 with a time limit of 5 hours.<br />
<br />
==== Hardware resource and job policy limits ====<br />
In addition to the hardware limitations, please be aware that there may also be policy limits imposed on your account for each partition. These limits restrict the number of cores, nodes, or GPUs that can be used at any given time. Since the limits are applied on a partition-by-partition basis, using resources in one partition should not affect the available resources you can use in another partition.<br />
<br />
These limits can be listed by running:<br />
<syntaxhighlight lang="bash"><br />
$ sacctmgr show qos format=Name,MaxWall,MaxTRESPU%20,MaxSubmitJobs<br />
Name MaxWall MaxTRESPU MaxSubmit<br />
---------- ----------- -------------------- ---------<br />
normal 7-00:00:00 2000<br />
breezy 3-00:00:00 cpu=384 2000<br />
gpu 7-00:00:00 13000<br />
cpu2019 7-00:00:00 cpu=240 2000<br />
gpu-v100 1-00:00:00 cpu=80,gres/gpu=4 2000<br />
single 7-00:00:00 cpu=408,node=75 2000<br />
razi 7-00:00:00 2000<br />
</syntaxhighlight><br />
<br />
==== Specifying a partition in a job ====<br />
One you have decided which partitions best suits your computation, you can select one or more partition on a job-by-job basis by including the <code>partition</code> keyword for an <code>SBATCH</code> directive in your batch job. Multiple partitions should be comma separated. If you omit the partition specification, the system will try to assign your job to appropriate hardware based on other aspects of your request. <br />
<br />
In some cases, you really should specify the partition explicitly. For example, if you are running single-node jobs with thread-based parallel processing requesting 8 cores you could use:<br />
<syntaxhighlight lang="bash"><br />
#SBATCH --mem=0 ❶<br />
#SBATCH --nodes=1 ❷<br />
#SBATCH --ntasks=1 ❸<br />
#SBATCH --cpus-per-task=8 ❹<br />
#SBATCH --partition=single,lattice ❺ <br />
</syntaxhighlight><br />
<br />
A few things to mention in this example:<br />
# <code>--mem=0</code> allocates all available memory on the compute node for the job. This effectively allocates the entire node for your job.<br />
# <code>--nodes=1</code> allocates 1 node for the job<br />
# <code>--ntasks=1</code> your job has a single task<br />
# <code>--cpus-per-task=8</code> asks for 8 CPUs per task. This job in total will request 8 * 1, or 8 CPUs.<br />
# <code>--partition=single,lattice</code> specifies that this job can run on either single or lattice.<br />
Suppose that your job requires at most 8 CPU cores and 10 GB of memory. The above Slurm request would be valid and optimal since your job fits neatly in a single node on the single and parallel partition. However, if you failed to specify the partition, Slurm may try to schedule your job to a partition with larger nodes, such as cpu2019 where each node has 40 cores and 190 GB of memory. If your job is scheduled on such a node, your job will be effectively wasting 32 cores and 180 GB of memory because <code>--mem=0</code> not only requests for 190 GB on this node, but also prevents other jobs from being scheduled on the same node.<br />
<br />
If you don't specify a partition, please give greater thought to the memory specification to make sure that the scheduler will not assign your job more resources than are needed.<br />
<br />
Parameters such as '''--ntasks-per-cpu''', '''--cpus-per-task''', '''--mem''' and '''--mem-per-cpu>''' have to be adjusted according to the capabilities of the hardware also. The product of --ntasks-per-cpu and --cpus-per-task should be less than or equal to the number given in the "Cores/node" column. The '''--mem>''' parameter (or the product of '''--mem-per-cpu''' and '''--cpus-per-task''') should be less than the "Memory limit" shown. If using whole nodes, you can specify '''--mem=0''' to request the maximum amount of memory per node.<br />
<br />
===== Examples =====<br />
Here are some examples of specifying the various partitions.<br />
<br />
As mentioned in the [[#Hardware|Hardware]] section above, the ARC cluster was expanded in January 2019. To select the 40-core general purpose nodes specify:<br />
<br />
#SBATCH --partition=cpu2019<br />
<br />
To run on the Tesla V100 GPU-enabled nodes, use the '''gpu-v100''' partition. You will also need to include an SBATCH directive in the form '''--gres=gpu:n''' to specify the number of GPUs, n, that you need. For example, if the software you are running can make use of both GPUs on a gpu-v100 partition compute node, use:<br />
<br />
#SBATCH --partition=gpu-v100 --gres=gpu:2<br />
<br />
For very large memory jobs (more than 185000 MB), specify the bigmem partition:<br />
<br />
#SBATCH --partition=bigmem<br />
<br />
If the more modern computers are too busy or you have a job well-suited to run on the compute nodes described in the legacy hardware section above, choose the cpu2013, Lattice or Parallel compute nodes by specifying the corresponding partition keyword:<br />
<br />
#SBATCH --partition=cpu2013<br />
#SBATCH --partition=lattice<br />
#SBATCH --partition=parallel<br />
<br />
There is an additional partition called '''single''' that provides nodes similar to the lattice partition, but, is intended for single-node jobs. Select the single partition with<br />
<br />
#SBATCH --partition=single<br />
<br />
=== Time limits ===<br />
Use the <code>--time</code> directive to tell the job scheduler the maximum time that your job might run. For example:<br />
#SBATCH --time=hh:mm:ss<br />
<br />
You can use <code>scontrol show partitions</code> or <code>sinfo</code> to see the current maximum time that a job can run.<br />
<syntaxhighlight lang="bash" highlight="6"><br />
$ scontrol show partitions<br />
PartitionName=single <br />
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL <br />
AllocNodes=ALL Default=NO QoS=single <br />
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO <br />
MaxNodes=UNLIMITED MaxTime=7-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED <br />
Nodes=cn[001-168] <br />
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO <br />
OverTimeLimit=NONE PreemptMode=OFF <br />
State=UP TotalCPUs=1344 TotalNodes=168 SelectTypeParameters=NONE <br />
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED <br />
</syntaxhighlight><br />
<br />
Alternatively, with <code>sinfo</code> under the <code>TIMELIMIT</code> column:<br />
<syntaxhighlight lang="bash"><br />
$ sinfo <br />
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST <br />
single up 7-00:00:00 1 drain* cn097 <br />
single up 7-00:00:00 1 maint cn002 <br />
single up 7-00:00:00 4 drain* cn[001,061,133,154] <br />
...<br />
</syntaxhighlight><br />
<br />
== Support ==<br />
{{Message Box<br />
|title=[[Support|Need Help or have other ARC Related Questions?]]<br />
|message=For all general RCS related issues, questions, or comments, please contact us at support@hpc.ucalgary.ca.<br />
|icon=Support Icon.png}}<br />
<br />
Please don't hesitate to [[Support|contact us]] directly by email if you need help using ARC or require guidance on migrating and running your workflows to ARC.<br />
<br />
[[Category:ARC]]<br />
[[Category:Guides]]</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=Support&diff=2351Support2023-03-02T17:14:50Z<p>Tthomas: /* Walk-in Consults */ removed foothills campus walk in consults</p>
<hr />
<div>For any questions relating to our High Performance Computing or related to Research Computing Services, there are several ways to get support:<br />
<br />
== Email ==<br />
Send email with your questions regarding HPC and RCS related services to support@hpc.ucalgary.ca.<br />
<br />
We recommend sending emails to our support address rather than a particular staff member in case the person you are trying to reach is away. <br />
<br />
== IT Related Questions ==<br />
For University of Calgary IT support, or issues relating to Email, VPN/Networking, or Desktops, please direct your questions to the IT Service Centre:<br />
* Website: http://ucalgary.ca/it<br />
* Email: itsupport@ucalgary.ca<br />
* Phone: 403-220-5555<br />
<br />
== Compute Canada Related Questions ==<br />
For Compute Canada specific questions, please contact: support@computecanada.ca<br />
<br />
__NOTOC__</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=How_to_transfer_data&diff=1900How to transfer data2022-06-08T18:43:25Z<p>Tthomas: /* Use Globus Web Application to transfer files */</p>
<hr />
<div>{{Message Box<br />
|title=Data Transfer Nodes<br />
|message=<br />
For performance and resource reasons, file transfers should be performed on the data transfer node arc-dtn.ucalgary.ca rather than on the the ARC login node. Since the ARC DTN has the same shares as ARC, any files you transfer to the DTN will also be available on ARC.<br />
}}<br />
<br />
<br />
= Command Line File Transfer Tools =<br />
You may use the following command-line file transfer utilities on Linux, MacOS, and Windows. File transfers using these methods require your computer to be on the University of Calgary campus network or via the University of Calgary IT General VPN.<br />
<br />
If you are working on a Windows computer, you will need to install these utilities separately as they are not installed by default. Newer versions of Windows 10 (1903 and up) have '''SSH''' built-in as part of the '''openssh''' package. However, you may be better off using one of the [[#GUI File Transfer]] tools listed in the following section.<br />
<br />
== <code>scp</code>: Secure Copy ==<br />
<code>scp</code> is a secure and encrypted method of transferring files between machines via SSH. It is available on Linux and Mac computers by default and can be installed on Windows by installing the OpenSSH package.<br />
<br />
The general format for the command is:<br />
$ scp [options] source destination<br />
<br />
* The <code>source</code> and <code>destination</code> fields can be a local file / directory or a remote one.<br />
* The ''local location'' is a normal Unix path, absolute or relative and <br />
* The ''remote location'' has a format <code>username@remote.system.name:file/path</code>.<br />
* The ''remote relative file path'' is relative to the home directory of the <code>username</code> on the remote system.<br />
<br />
You may see all the available options with <code>scp</code> by viewing the [http://man7.org/linux/man-pages/man1/scp.1.html man page].<br />
<br />
=== Example Usage ===<br />
Common operations are given below. On your desktop, to:<br />
* Transfer a single file (eg. <code>data.dat</code>) to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp data.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Transfer all files ending with <code>.dat</code> to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp *.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* To transfer an entire directory to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp -r my_data_directory/ username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
<br />
== <code>rsync</code> ==<br />
<code>rsync</code> is a utility for transferring and synchronizing files efficiently. The efficiency for its file synchronization is achieved by its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. <br />
<br />
<code>rsync</code> can be used to copy files and directories locally on a system or between two computers via SSH. Unlike <code>scp</code>. Because it is designed to synchronize two locations, partial transfers can be restarted by re-running <code>rsync</code> without losing progress. Resuming a partial transfer is not possible with <code>scp</code>.<br />
<br />
The general format for the command is similar to '''scp''':<br />
$ rsync [options] source destination<br />
* The <code>source</code> and <code>destination</code> fields can be a local file / directory or a remote one.<br />
* <code>rsync</code> '''cannot''' transfer files between '''two remote''' locations. The source or the destination must be a local path.<br />
* The ''local location'' is a normal Unix path, absolute or relative and <br />
* The ''remote location'' has a format <code>username@remote.system.name:file/path</code>.<br />
* The ''remote relative file path'' is relative to the home directory of the <code>username</code> on the remote system.<br />
<br />
You may see all the available options with <code>rsync</code> by viewing the [http://man7.org/linux/man-pages/man1/rsync.1.html man page].<br />
<br />
=== Example Usage ===<br />
Common operations are given below. On your desktop, to:<br />
<br />
* Upload a single file (eg. <code>data.dat</code>) from your workstation to your ARC: <syntaxhighlight lang="bash"><br />
desktop$ rsync -v data.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Upload all files matching a wildcard (eg. ending in <code>*.dat</code>): <syntaxhighlight lang="bash"><br />
$ rsync -v *.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Upload an entire directory (eg. <code>my_data</code> to <code>~/projects/project2</code>): <syntaxhighlight lang="bash"><br />
$ rsync -axv my_data username@arc-dtn.ucalgary.ca:~projects/project2/<br />
</syntaxhighlight><br />
* Upload more than one directory: <syntaxhighlight lang="bash"><br />
desktop$ rsync -axv my_data1 my_data2 my_data3 username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Download one file (eg. <code>output.dat</code>) from ARC to the current directory on your workstation: <syntaxhighlight lang="bash"><br />
## Note the '.' at the end of the command which references the current working directory on your computer<br />
desktop$ rsync -v username@arc-dtn.ucalgary.ca:projects/project1/output.dat .<br />
</syntaxhighlight><br />
* Download one directory (eg. <code>outputs</code>) from ARC to the current directory on your workstation:<syntaxhighlight lang="bash"><br />
desktop$ rsync -axv username@arc-dtn.ucalgary.ca:projects/project1/outputs .<br />
</syntaxhighlight><br />
<br />
== <code>sftp</code>: secure file transfer protocol ==<br />
<br />
* Manual page on-line: http://man7.org/linux/man-pages/man1/sftp.1.html<br />
<br />
<br />
'''sftp''' is a file transfer program, similar to '''ftp''', <br />
which performs all operations over an encrypted '''ssh''' transport. <br />
It may also use many features of '''ssh''', such as public key authentication and compression.<br />
<br />
<br />
'''sftp''' has an interactive mode, <br />
in which sftp understands a set of commands similar to those of '''ftp'''.<br />
Commands are case insensitive.<br />
<br />
== <code>rclone</code>: rsync for cloud storage ==<br />
'''Rclone''' is a command line program to sync files and directories to and from a number of on-line storage services.<br />
<br />
* https://rclone.org/<br />
You may backup your data from arc-dtn to your personal 5TB UCalgary OneDrive to create a safe second copy at a distance.<br />
<br />
[https://rcs.ucalgary.ca/images/8/8e/Rclone_and_OneDrive_on_arc.pdf detailed rclone configuration instructions]<br />
<br />
Please note, if you are syncing your OneDrive with a PC or Mac, your new backup of arc home may be auto-replicated to your computer. You may choose to not replicate using the PC or Mac OneDrive client (help & settings -> settings -> account -> Choose folders) .<br />
<br />
== <code>curl</code> and <code>wget</code>: downloading from the internet ==<br />
To download a file with the ability to resume a partial download:<br />
curl -c http://example.com/resource.tar.gz -O <br />
wget -c http://example.com/resource.tar.gz<br />
<br />
<br />
= Graphical File Transfer Tools =<br />
== FileZilla ==<br />
FileZilla is a free cross-platform file transfer program that can transfer files via FTP and SFTP. <br />
<br />
=== Installation ===<br />
You may obtain Filezilla from the project's official website at: https://filezilla-project.org/download.php?type=client. '''Please note''': The official installer may bundle ads and unwanted software. Be careful when clicking through.<br />
<br />
Alternatively, you may obtain Filezilla from Ninite: https://ninite.com/filezilla<br />
<br />
=== Connecting to ARC ===<br />
If working off campus, first connect to the University of Calgary General VPN. Open Filezilla and connect to <code>arc-dtn.ucalgary.ca</code> on port <code>22</code>.<br />
[[File:Filezilla.jpg|alt=Connecting to ARC using Filezilla|none|thumb|Connecting to ARC using Filezilla]] <br />
<br />
== MobaXterm (Windows) ==<br />
'''MobaXterm''' is the recommended tool for remote access and data transfer in '''Windows''' OSes.<br />
<br />
MobaXterm is a one-stop solution for most remote access work on a compute cluster or a Unix / Linux server.<br />
<br />
It provides many Unix like utilities for Windows including an '''SSH''' client and '''X11''' graphics server. It provides a graphical interface for <br />
data transfer operations.<br />
<br />
* Website: https://mobaxterm.mobatek.net/<br />
<br />
== WinSCP (Windows) ==<br />
WinSCP is a free Windows file transfer tool.<br />
<br />
https://winscp.net/eng/index.php<br />
<br />
= Cloud based Fire Transfer Services =<br />
== Globus File Transfer ==<br />
Globus File Transfer is a cloud based service for file transfer and file sharing. It uses GridFTP for high speed and reliable data transfers<br><br />
<br />
* The Alliance Docs on Globus: https://docs.alliancecan.ca/wiki/Globus<br />
<br />
=== How to get started ===<br />
<br />
# Navigate to the web page https://www.globusid.org <br />
# Create a Globus ID and password using your google account.<br />
<br />
=== Use Globus Web Application to transfer files ===<br />
<br />
# To initiate data transfers using the Globus Web Application, navigate to https://www.globusid.org/login and log into your Globus account <br />
# On the left panel, click on File Manager to define/select the collection you wish to use. For example, to transfer data from ARC cluster (collection 1) to Compute Canada cedar cluster (collection 2) <br />
#* Under collection, for ARC data transfer node search for arc-dtn-collection. You will see it is listed with the following description: "Mapped Collection on UCalgary ARC-DTN endpoint "<br />
#* from the drop down menu. Authenticate your access using UCalgary IT credentials. This will bring you to the home directory on ARC.<br />
#* Next, for Compute Canada cedar data transfer node choose 'collection 2' as 'computecanada#cedar-dtn' from the drop down menu. Again authenticate your access using Compute Canada credentials. Navigate to the location where you want to transfer the file.<br />
#* Select the file to be transferred from collection 1' and initiate the transfer process.<br />
<br />
=== Use Globus Web Application to Share Files ===<br />
You may grant access to a folder in your allocation on ARC to be used for uploading or sharing files with any individual with a globus account, either through ComputeCanada or their own institution. <br />
<br />
Please see https://docs.globus.org/how-to/share-files/<br />
<br />
<br />
Transferring Large Datasets<br />
=== Using screen and rsync ===<br />
If you want to transfer a large amount of data from a remote Unix system to ARC you can use ''<code>rsync</code>'' to handle the transfer. However, you will have to keep your SSH session from your workstation connected during the entire transfer which is often not convenient or not feasible. <br />
<br />
To overcome this one can run the '''rsync''' transfer inside a '''screen''' virtual session on ARC. '''screen''' creates an SSH session local to ARC and allows for reconnection from SSH sessions from your workstation.<br />
<br />
To begin, login to ARC and start <code>screen</code> with the screen command:<br />
<pre><br />
# Login to ARC<br />
$ ssh username@arc.ucalgary.ca<br />
<br />
# Start a screen session<br />
$ screen<br />
<br />
# While in the new screen session, start the transfer with rsync.<br />
$ rsync -axv ext_user@external.system:path/to/remote/data .<br />
<br />
# Disconnect from the screen session with the hotkey 'Ctrl-a d'<br />
# You may now disconnect from ARC or close the lid of you laptop or turn off the computer.<br />
</pre><br />
<br />
To check if the transfer has been finished.<br />
<pre><br />
# Login to ARC<br />
$ ssh username@arc.ucalgary.ca<br />
<br />
# Reconnect to the screen session<br />
$ screen -r<br />
<br />
# If the transfer has been finished close the screen session.<br />
$ exit<br />
<br />
# If the transfer is still running, disconnect from the screen session with the hotkey 'Ctrl-a d'<br />
</pre><br />
<br />
=== Very large files ===<br />
If the files are large and the transfer speed is low the transfer may fail before the file has been transferred. <br />
'''rsync''' may not be of help here, as it will not restart the file transfer (have not tested recently).<br />
<br />
The solution may be to split the large file into smaller chunks, transfer them using rsync and then join them on the remote system (ARC for example):<br />
<br />
<pre><br />
# Large file is 506MB in this example.<br />
$ ls -l t.bin<br />
-rw-r--r-- 1 drozmano drozmano 530308481 Jun 8 11:06 t.bin<br />
<br />
# split the file:<br />
$ split -b 100M t.bin t.bin_chunk.<br />
<br />
# Check the chunks.<br />
$ ls -l t.bin_chunk.*<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.aa<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ab<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ac<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ad<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ae<br />
-rw-r--r-- 1 drozmano drozmano 6020481 Jun 8 11:09 t.bin_chunk.af<br />
<br />
# Transfer the files:<br />
$ rsync -axv t.bin_chunks.* username@arc.ucalgary.ca:<br />
</pre><br />
<br />
Then login to ARC and join the files:<br />
<pre><br />
$ cat t.bin_chunk.* > t.bin<br />
<br />
$ ls -l <br />
-rw-r--r-- 1 drozmano drozmano 530308481 Jun 8 11:06 t.bin<br />
</pre><br />
Success.<br />
<br />
[[Category:Guides]]</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=How_to_get_an_account&diff=1785How to get an account2022-05-16T18:35:54Z<p>Tthomas: /* General */</p>
<hr />
<div>= General =<br />
<br />
Only '''University of Calgary researchers''' are eligible for an account on our HPC systems. <br />
This means that any account applicant has to have an '''active UCalgary IT and Email account''' to be able to get access to ARC.<br />
If you are not University of Calgary researcher and have collaboration work that required access to the ARC cluster,<br />
you will have to obtain the status of the [[External collaborators | General Associate]] with UofC <br />
to get an IT account and a UofC email address first. <br />
<br />
<br />
'''Any UofC researcher''' can request an account on the ARC cluster on their own,<br />
for this purpose '''graduate students''' and "up" are considered as researchers.<br />
<br />
'''Undergraduate students''' are not researchers in this sense. <br />
If an undergraduate student is working for a research group and needs access to ARC,<br />
then their account has to be requested by their scientific supervisors.<br />
The supervisor will also have to confirm that the research work that the <br />
student is going to perform is related to the supervisor's area of research<br />
and that the student needs access to the ARC cluster.<br />
<br />
<br />
To apply, please copy and paste the text below into and email to <br />
[mailto:support@hpc.ucalgary.ca support@hpc.ucalgary.ca], <br />
please respond to the questions in the text.<br />
<br />
= Application form =<br />
<br />
'''About myself:'''<br />
<br />
* What is your status with UofC?<br />
<br />
* What research group do you work for? Who is your supervisor?<br />
: (If you are a Principal Investigator yourself, please respond accordingly).<br />
<br />
* How did you learn about the ARC cluster?<br />
<br />
* Do you have any experience with '''Linux''' and / or '''compute clusters'''?<br />
<br />
* Does anybody else in your group use ARC for their work?<br />
<br />
* What problem are you trying to address by using a compute cluster? '''What is lacking''' on your personal computer?<br />
<br />
<br />
'''About the project(s) I am going to work on''': <br />
<br />
* Please tell us a briefly about the '''research topic''' you are going to be working on on ARC.<br />
<br />
* What are the '''data''' you are planning to work on? What '''form''' is it in?<br />
<br />
* What '''kind of analysis''' is it?<br />
<br />
* What '''software''' are you going to be using?<br />
<br />
* Do you have an estimate for the '''amount''' of work?<br />
<br />
<br />
By applying for an ARC account I certify that '''I understand that''':<br />
<br />
<br />
The storage provided by the ARC cluster is only suitable for '''Level 1 and Level 2 data''', as classified according to <br />
the UofC Information '''Security Classification Standard'''.<br><br />
https://www.ucalgary.ca/legal-services/sites/default/files/teams/1/Standards-Legal-Information-Security-Classification-Standard.pdf<br />
<br />
<br />
ARC is a research cluster, which means it has high performance but can be stopped for required maintenance when needed. <br />
ARC's storage cannot and should not be used as a main storage facility for research data, as the data will not be available, if the cluster is under maintenance.<br />
The master copy of research data should be stored elsewhere and only part of that data are expected to be copied to ARC for computational analysis. <br />
Data on ARC's storage are not backed up.<br />
<br />
<br />
User's accounts on ARC are subject to '''automatic deletion after 12 months of inactivity'''. <br />
Please log in periodically to prevent your account from being deleted. <br />
You will be notified before the account is deleted.<br />
Please note that when an account is delete '''all the data''' stored in the home directory of the account are '''deleted''' as well.</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=How_to_get_an_account&diff=1784How to get an account2022-05-16T18:34:20Z<p>Tthomas: /* General */</p>
<hr />
<div>= General =<br />
<br />
Only '''University of Calgary researchers''' are eligible for an account on our HPC systems. <br />
This means that any account applicant has to have an '''active UCalgary IT and Email account''' to be able to get access to ARC.<br />
If you are not University of Calgary researcher and have collaboration work that required access to the ARC cluster,<br />
you will have to obtain the status of the [[External collaborators | General Associate]] with UofC <br />
to get an IT account and a UofC email address first. <br />
<br />
'''Any UofC researcher''' can request an account on the ARC cluster on their own,<br />
for this purpose '''graduate students''' and "up" are considered as researchers.<br />
<br />
'''Undergraduate students''' are not researchers in this sense. <br />
If an undergraduate student is working for a research group and needs access to ARC,<br />
then their account has to be requested by their scientific supervisors.<br />
The supervisor will also have to confirm that the research work that the <br />
student is going to perform is related to the supervisor's area of research<br />
and that the student needs access to the ARC cluster.<br />
<br />
To apply, please copy and paste the text below into and email to <br />
[mailto:support@hpc.ucalgary.ca support@hpc.ucalgary.ca], <br />
please respond to the questions in the text.<br />
<br />
= Application form =<br />
<br />
'''About myself:'''<br />
<br />
* What is your status with UofC?<br />
<br />
* What research group do you work for? Who is your supervisor?<br />
: (If you are a Principal Investigator yourself, please respond accordingly).<br />
<br />
* How did you learn about the ARC cluster?<br />
<br />
* Do you have any experience with '''Linux''' and / or '''compute clusters'''?<br />
<br />
* Does anybody else in your group use ARC for their work?<br />
<br />
* What problem are you trying to address by using a compute cluster? '''What is lacking''' on your personal computer?<br />
<br />
<br />
'''About the project(s) I am going to work on''': <br />
<br />
* Please tell us a briefly about the '''research topic''' you are going to be working on on ARC.<br />
<br />
* What are the '''data''' you are planning to work on? What '''form''' is it in?<br />
<br />
* What '''kind of analysis''' is it?<br />
<br />
* What '''software''' are you going to be using?<br />
<br />
* Do you have an estimate for the '''amount''' of work?<br />
<br />
<br />
By applying for an ARC account I certify that '''I understand that''':<br />
<br />
<br />
The storage provided by the ARC cluster is only suitable for '''Level 1 and Level 2 data''', as classified according to <br />
the UofC Information '''Security Classification Standard'''.<br><br />
https://www.ucalgary.ca/legal-services/sites/default/files/teams/1/Standards-Legal-Information-Security-Classification-Standard.pdf<br />
<br />
<br />
ARC is a research cluster, which means it has high performance but can be stopped for required maintenance when needed. <br />
ARC's storage cannot and should not be used as a main storage facility for research data, as the data will not be available, if the cluster is under maintenance.<br />
The master copy of research data should be stored elsewhere and only part of that data are expected to be copied to ARC for computational analysis. <br />
Data on ARC's storage are not backed up.<br />
<br />
<br />
User's accounts on ARC are subject to '''automatic deletion after 12 months of inactivity'''. <br />
Please log in periodically to prevent your account from being deleted. <br />
You will be notified before the account is deleted.<br />
Please note that when an account is delete '''all the data''' stored in the home directory of the account are '''deleted''' as well.</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=List_of_courses_on_TALC&diff=1563List of courses on TALC2021-11-09T17:24:34Z<p>Tthomas: /* A New Course Checklist */</p>
<hr />
<div>{{Message Box<br />
|title=Interested in using TALC for your course?<br />
|message='''If you are the instructor for a course that could benefit from using TALC, please contact us at support@hpc.ucalgary.ca to discuss your requirements.'''<br />
<br >To ensure that the appropriate software is available, student accounts are in place, and appropriate training has been provided for your teaching assistants, it is best to start this discussion several months prior to the start of the course.<br />
|icon=Support Icon.png}}<br />
<br />
= Information =<br />
<br />
* TALC Terms of Use: https://rcs.ucalgary.ca/TALC_Terms_of_Use<br />
<br />
* TALC Guide: https://rcs.ucalgary.ca/TALC_Cluster<br />
<br />
<br />
== A New Course Checklist ==<br />
<br />
'''Three month''' before the course.<br />
* Please contact us about the course you are going to teach using TALC.<br />
<br />
<br />
Soon after:<br />
* Request '''accounts''' on TALC for yourself, and TA who is going to be helping with the course.<br />
<br />
<br />
* If you are planning to share data with the students, request a '''shared directory for the course'''. <br />
: There will also be a '''unix group''' on TALC to control access to the shared directory.<br />
: The directory name and the unix group will probably have the same name, like "course601-21". <br />
: Your and TA's accounts have to be added to the access group.<br />
<br />
<br />
* The data for courses run on TALC is deleted once the course is over. So that the '''shared directory''' and '''software''' for a course have to be setup every time the course is run.<br />
: If you need to build / have some '''specific software''' on TALC for the course, please start early. <br />
: It needs to be '''installed and tested''' well before the course start, so that the solution can be found in case something does not work.<br />
<br />
<br />
* If you '''need help with setting up the software''' for the course, let the RCS support know at support@hpc.ucalgary.ca .<br />
<br />
<br />
* During the course, if students have '''difficulties with using TALC''', the TAs are expected to help the students.<br />
: If TAs need training, this has to be arranged before the course with us (RCS);<br />
<br />
<br />
* As soon as you have a '''list of students''' who are going to take the course, please send it to support@hpc.ucalgary.ca .<br />
: The list has to have students' '''names''' as well as associated '''UofC email addresses'''.<br />
: The accounts will be created and added to the access group.<br />
<br />
<br />
* If you have any concerns or questions about running a course on TALC please let us know.<br />
<br />
= Current Courses =<br />
<br />
== Fall 2021 ==<br />
<br />
* ENSF 619.01 - Ethan MacDonald<br />
* MDSC 523 - David Anderson<br />
* ENSF 619.02 - Roberto Medeiros de Souza<br />
* ENSF 612 - Gias Uddin and Ajoy Das<br />
<br />
= Previous Courses =<br />
== 2021 Winter ==<br />
* GLGY 605 - Benjamin Tutolo<br />
* BMEN 415 - Ethan MacDonald<br />
* MDSC 201 - David Anderson<br />
<br />
== 2020 Spring ==<br />
* Bioinformatics workshop Q Zhang<br />
* Bioinformatics workshop Q Zhang<br />
<br />
==2020 Winter==<br />
* DATA 623 R Walker<br />
* GLGY 605 B Tutolo<br />
* DATA 608 P Federl<br />
* ENSF 612 J Kaur<br />
<br />
==2020 Winter Block Week==<br />
* MDSC 395 D Anderson<br />
<br />
= Resources =<br />
<br />
== Academic Schedule ==<br />
* See https://www.ucalgary.ca/pubs/calendar/current/academic-schedule.html for the UofC 's academic schedule. <br />
<br />
[[Category:TALC]]</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=List_of_courses_on_TALC&diff=1562List of courses on TALC2021-11-09T17:23:57Z<p>Tthomas: /* A New Course Checklist */</p>
<hr />
<div>{{Message Box<br />
|title=Interested in using TALC for your course?<br />
|message='''If you are the instructor for a course that could benefit from using TALC, please contact us at support@hpc.ucalgary.ca to discuss your requirements.'''<br />
<br >To ensure that the appropriate software is available, student accounts are in place, and appropriate training has been provided for your teaching assistants, it is best to start this discussion several months prior to the start of the course.<br />
|icon=Support Icon.png}}<br />
<br />
= Information =<br />
<br />
* TALC Terms of Use: https://rcs.ucalgary.ca/TALC_Terms_of_Use<br />
<br />
* TALC Guide: https://rcs.ucalgary.ca/TALC_Cluster<br />
<br />
<br />
== A New Course Checklist ==<br />
<br />
'''Three month''' before the course.<br />
* Please contact us about the course you are going to teach using TALC.<br />
<br />
<br />
Soon after:<br />
* Request '''accounts''' on TALC for yourself, and TA who is going to be helping with the course.<br />
<br />
<br />
* If you are planning to share data with the students, request a '''shared directory for the course'''. <br />
: There will also be a '''unix group''' on TALC to control access to the shared directory.<br />
: The directory name and the unix group will probably have the same name, like "course601-21". <br />
: Your and TA's accounts have to be added to the access group.<br />
<br />
<br />
* The data for courses run on TALC is deleted once the course is over. So that the '''shared directory''' and '''software''' for a course have to be setup every time the course is run.<br />
: If you need to build / have some '''specific software''' on TALC for the course, please start early. <br />
: It needs to be '''installed and tested''' well before the course start, so that the solution can be found in case something does not work.<br />
<br />
<br />
* If you '''need help with setting up the software''' for the course, let the RCS support know at support@hpc.ucalgary.ca .<br />
<br />
<br />
* During the course, if students have '''difficulties with using TALC''', the TAs are expected to help the students.<br />
: If TAs need training, this has to be arrange before the course with us (RCS);<br />
<br />
<br />
* As soon as you have a '''list of students''' who are going to take the course, please send it to support@hpc.ucalgary.ca .<br />
: The list has to have students' '''names''' as well as associated '''UofC email addresses'''.<br />
: The accounts will be created and added to the access group.<br />
<br />
<br />
* If you have any concerns or questions about running a course on TALC please let us know.<br />
<br />
= Current Courses =<br />
<br />
== Fall 2021 ==<br />
<br />
* ENSF 619.01 - Ethan MacDonald<br />
* MDSC 523 - David Anderson<br />
* ENSF 619.02 - Roberto Medeiros de Souza<br />
* ENSF 612 - Gias Uddin and Ajoy Das<br />
<br />
= Previous Courses =<br />
== 2021 Winter ==<br />
* GLGY 605 - Benjamin Tutolo<br />
* BMEN 415 - Ethan MacDonald<br />
* MDSC 201 - David Anderson<br />
<br />
== 2020 Spring ==<br />
* Bioinformatics workshop Q Zhang<br />
* Bioinformatics workshop Q Zhang<br />
<br />
==2020 Winter==<br />
* DATA 623 R Walker<br />
* GLGY 605 B Tutolo<br />
* DATA 608 P Federl<br />
* ENSF 612 J Kaur<br />
<br />
==2020 Winter Block Week==<br />
* MDSC 395 D Anderson<br />
<br />
= Resources =<br />
<br />
== Academic Schedule ==<br />
* See https://www.ucalgary.ca/pubs/calendar/current/academic-schedule.html for the UofC 's academic schedule. <br />
<br />
[[Category:TALC]]</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=List_of_courses_on_TALC&diff=1561List of courses on TALC2021-11-09T17:23:05Z<p>Tthomas: /* A New Course Checklist */</p>
<hr />
<div>{{Message Box<br />
|title=Interested in using TALC for your course?<br />
|message='''If you are the instructor for a course that could benefit from using TALC, please contact us at support@hpc.ucalgary.ca to discuss your requirements.'''<br />
<br >To ensure that the appropriate software is available, student accounts are in place, and appropriate training has been provided for your teaching assistants, it is best to start this discussion several months prior to the start of the course.<br />
|icon=Support Icon.png}}<br />
<br />
= Information =<br />
<br />
* TALC Terms of Use: https://rcs.ucalgary.ca/TALC_Terms_of_Use<br />
<br />
* TALC Guide: https://rcs.ucalgary.ca/TALC_Cluster<br />
<br />
<br />
== A New Course Checklist ==<br />
<br />
'''Three month''' before the course.<br />
* Please contact us about the course you are going to teach using TALC.<br />
<br />
<br />
Soon after:<br />
* Request '''accounts''' on TALC for yourself, and TA who is going to be helping with the course.<br />
<br />
<br />
* If you are planning to share data with the students, request a '''shared directory for the course'''. <br />
: There will also be a '''unix group''' on TALC to control access to the shared directory.<br />
: The directory name and the unix group will probably have the same name, like "course601-21". <br />
: Your and TA's accounts have to be added to the access group.<br />
<br />
<br />
* The data for courses run on TALC is deleted once the course is over. So that the '''shared directory''' and '''software''' for a course have to be setup every time the course is run.<br />
: If you need to build / have some '''specific software''' on TALC for the course, please start early. <br />
: It needs to be '''installed and tested''' well before the course start, so that the solution can be found in case something does not work.<br />
<br />
<br />
* If you '''need help with setting up the software''' for the course, let the RCS support know at support@hpc.ucalgary.ca .<br />
<br />
<br />
* During the course, if students have '''difficulties with using TALC''', the TAs are expected to help the students.<br />
: If TAs need training, this has to be arrange before the course with us (RCS);<br />
<br />
<br />
* As soon as you have a '''list of students''' who is going to take the course, please send it to support@hpc.ucalgary.ca .<br />
: The list has to have students' '''names''' as well as associated '''UofC email addresses'''.<br />
: The accounts will be created and added to the access group.<br />
<br />
<br />
* If you have any concerns or questions about running a course on TALC please let us know.<br />
<br />
= Current Courses =<br />
<br />
== Fall 2021 ==<br />
<br />
* ENSF 619.01 - Ethan MacDonald<br />
* MDSC 523 - David Anderson<br />
* ENSF 619.02 - Roberto Medeiros de Souza<br />
* ENSF 612 - Gias Uddin and Ajoy Das<br />
<br />
= Previous Courses =<br />
== 2021 Winter ==<br />
* GLGY 605 - Benjamin Tutolo<br />
* BMEN 415 - Ethan MacDonald<br />
* MDSC 201 - David Anderson<br />
<br />
== 2020 Spring ==<br />
* Bioinformatics workshop Q Zhang<br />
* Bioinformatics workshop Q Zhang<br />
<br />
==2020 Winter==<br />
* DATA 623 R Walker<br />
* GLGY 605 B Tutolo<br />
* DATA 608 P Federl<br />
* ENSF 612 J Kaur<br />
<br />
==2020 Winter Block Week==<br />
* MDSC 395 D Anderson<br />
<br />
= Resources =<br />
<br />
== Academic Schedule ==<br />
* See https://www.ucalgary.ca/pubs/calendar/current/academic-schedule.html for the UofC 's academic schedule. <br />
<br />
[[Category:TALC]]</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=TALC_Terms_of_Use&diff=1546TALC Terms of Use2021-11-09T16:22:29Z<p>Tthomas: /* Guidelines and Regulations */</p>
<hr />
<div>The [[TALC Cluster Guide|Teaching and Learning Cluster (TALC)]] is a computing resource provided by Research Computing Services (RCS) to support approved courses and workshops. Usage of the cluster is<br />
subject to certain conditions as outlined below and detailed on this page.<br />
<br />
{{Message Box<br />
|title=Questions or Concerns?<br />
|message=Please send all questions and inquiries to support@hpc.ucalgary.ca.<br />
|icon=Support Icon.png}}<br />
<br />
==Guidelines and Regulations==<br />
* Usage must not violate Municipal, Provincial and Federal laws<br />
* Usage must not violate the University's Policies and Procedures outlined in the [https://www.ucalgary.ca/policies/files/policies/acceptable-use-of-electronic-resources-and-information-policy.pdf Acceptable Use of Electronic Resources and Information Policy]<br />
* This account is for your use only. It is made available to you specifically for the course that requires the use of TALC. You must not share your password or let anyone use your account.<br />
* Commercial use of TALC, including digital currency mining, is strictly prohibited.<br />
* TALC is configured for Level 1 and Level 2 data as set forth in the [https://www.ucalgary.ca/policies/files/policies/im010-03-security-standard_0.pdf University of Calgary Information Security Classification Standard] and is not suitable for Level 3 or Level 4 data.<br />
* RCS reserves the right to examine files, programs and any other material used on RCS systems at any time without warning.<br />
* Evidence of inappropriate use of TALC may result in immediate loss of access to your TALC account and may result in academic misconduct.<br />
* Please note that no backups are performed for data stored on TALC. It is your responsibility to copy data you need to save elsewhere. By default, your account and data will be deleted one week after the last course associated with the account has finished.<br />
* To ensure that the appropriate software is available, student accounts are in place, and appropriate training has been provided for your teaching assistants, it is best to contact RCS several months prior to the start of the course. <br />
<br />
[[Category:TALC]]</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=TALC_Cluster&diff=1545TALC Cluster2021-11-09T16:03:12Z<p>Tthomas: /* Obtaining an account */</p>
<hr />
<div>{{Message Box<br />
|icon=Security Icon.png<br />
|title=Cybersecurity awareness at the U of C<br />
|message=Please note that there are typically about 950 phishing attempts targeting University of Calgary accounts each month. This is just a reminder to be careful about computer security issues, both at home and at the University. Please visit https://it.ucalgary.ca/it-security for more information, tips on secure computing, and how to report suspected security problems.}}<br />
<br />
This guide gives an overview of the Teaching and Learning Cluster (TALC) at the University of Calgary and is intended to be read by new account holders getting started on TALC. This guide covers topics as the hardware and performance characteristics, available software, usage policies and how to log in and run jobs. <br />
<br />
==Introduction==<br />
TALC is a cluster of computers created by Research Computing Services in response to requests for a central computing resource to support academic courses and workshops offered at the University of Calgary. It is a complement to the Advanced Research Computing (ARC) cluster that is used for research, rather than educational purposes. The software environment in the TALC and ARC clusters very similar and workflows between the two clusters are identical. What students learn about using TALC will have direct applicability to using ARC should they go on to use ARC for research work. <br />
<br />
If you are the instructor for a course that could benefit from using TALC, please review this guide, the [[TALC Terms of Use]], then contact us at support@hpc.ucalgary.ca to discuss your requirements. To ensure that the appropriate software is available, student accounts are in place, and appropriate training has been provided for your teaching assistants, it is best to start this discussion several months prior to the start of the course.<br />
<br />
If you are a student in a course using TALC, please review this guide for basic instructions in using the cluster. Questions should first be directed to the teaching assistants or instructor for your course.<br />
<br />
===Obtaining an account===<br />
TALC account requests are expected to be submitted by the course instructor rather than from individual students. You must have a University of Calgary IT account in order to use TALC. If you do not have a University of IT account or email address, please register for one at https://itregport.ucalgary.ca/. In order to ensure TALC is provisioned in time for a course start date, the instructor should submit the initial list of @ucalgary.ca accounts needed for the course 2 weeks before the start date.<br />
<br />
=== Getting Support ===<br />
{{Message Box<br />
|title=Need Help or have other TALC Related Questions?<br />
|message='''Students''', please send TALC-related questions to your course instructor or teaching assistants.<br /><br />
'''Course instructors and TAs''', please report system issues to support@hpc.ucalgary.ca).<br />
|icon=Support Icon.png}}<br />
<br />
==Hardware==<br />
The TALC cluster is comprised of repurposed research clusters that are a few generations old. As a result, individual processor performance will not be comparable to the latest processors but should be sufficient for educational purposes and course work. <br />
{| class="wikitable"<br />
!Partition<br />
!Description<br />
!Nodes<br />
!CPU Cores, Model, and Year<br />
!Installed Memory<br />
!GPU<br />
!Network<br />
|-<br />
|gpu<br />
|GPU Compute<br />
|3<br />
|12 cores, 2x Intel(R) Xeon(R) Bronze 3204 CPU @ 1.90GHz (2019)<br />
|192 GB<br />
|5x NVIDIA Corporation TU104GL [Tesla T4]<br />
|40 Gbit/s InfiniBand<br />
|-<br />
|cpu24<br />
|General Purpose Compute<br />
|15<br />
|24 cores, 4x Six-Core AMD Opteron(tm) Processor 8431 (2009)<br />
|256 GB<br />
|N/A<br />
|40 Gbit/s InfiniBand<br />
|-<br />
|bigmem<br />
|General Purpose Compute<br />
|2<br />
|32 cores, 4x Intel(R) Xeon(R) CPU E7- 4830 @ 2.13GHz (2015)<br />
|1024 GB<br />
|N/A<br />
|40 Gbit/s InfiniBand<br />
|}<br />
<br />
===Storage===<br />
{{Message Box<br />
| title=No Backup Policy!<br />
| message=You are responsible for your own backups. Since accounts on TALC and related data are removed shortly after the associated course has finished, you should download anything you need to save to your own computer before the end of the course.<br />
}}<br />
<br />
TALC is connected to a network disk storage system. This storage is split across the <code>/home</code> and <code>/scratch</code> file systems. <br />
====<code>/home</code>: Home file system====<br />
Each user has a directory under /home and is the default working directory when logging in to TALC. Each home directory has a per-user quota of 500 GB. This limit is fixed and cannot be increased.<br />
<br />
Note on file sharing: Due to security concerns, permissions set using <code>chmod</code> on your home directory to allow other users to read/write to your home directory be automatically reverted by an automated system process unless an explicit exception is made. If you need to share files with other researchers on the ARC cluster, please write to support@hpc.ucalgary.ca to ask for such an exception.<br />
<br />
====<code>/scratch</code>: Scratch file system for large job-oriented storage====<br />
Associated with each job, under the <code>/scratch</code> directory, a subdirectory is created that can be referenced in job scripts as <code>/scratch/${SLURM_JOB_ID}</code>. You can use that directory for temporary files needed during the course of a job. Up to 30 TB of storage may be used, per user (total for all your jobs) in the <code>/scratch</code> file system. <br />
<br />
Data in <code>/scratch</code> associated with a given job will be deleted automatically, without exception, five days after the job finishes.<br />
<br />
== Software ==<br />
{{Message Box<br />
| title=Software Package Requests<br />
| message=Course instructors or teaching assistants should write to support@hpc.ucalgary.ca if additional software is required for their course.<br />
}}<br />
<br />
All ARC nodes run the latest version of CentOS 7 with the same set of base software packages. For your convenience, we have packaged commonly used software packages and dependencies as modules available under <code>/global/software</code>. If your software package is not available as a module, you may also try Anaconda which allows users to manage and install custom packages in an isolated environment.<br />
<br />
For a list of available packages that have been made available, please see [[ARC Software pages]]. <br />
<br />
=== Modules ===<br />
The setup of the environment for using some of the installed software is through the <code>module</code> command.<br />
<br />
Software packages bundled as a module will be available under <code>/global/software</code> and can be listed with the <code>module avail</code> command.<br />
<syntaxhighlight lang="bash"><br />
$ module avail<br />
</syntaxhighlight><br />
<br />
To enable Python, load the Python module by running:<br />
<syntaxhighlight lang="bash"><br />
$ module load python/anaconda-3.6-5.1.0<br />
</syntaxhighlight><br />
<br />
To unload the Python module, run:<br />
<syntaxhighlight lang="bash"><br />
$ module remove python/anaconda-3.6-5.1.0<br />
</syntaxhighlight><br />
<br />
To see currently loaded modules, run:<br />
<syntaxhighlight lang="bash"><br />
$ module list<br />
</syntaxhighlight><br />
<br />
==Using TALC==<br />
{{Message Box<br />
|title=Usage subject to [[TALC Terms of Use]]<br />
|message=Please review the [[TALC Terms of Use]] prior to using TALC.<br />
|icon=Support Icon.png}}<br />
<br />
===Logging in===<br />
To log in to TALC, connect using SSH to talc.ucalgary.ca. Connections to TALC are accepted only from the University of Calgary network (on campus) or through the University of Calgary General VPN (off campus).<br />
<br />
See [[Connecting to RCS HPC Systems]] for more information.<br />
<br />
===Working interactively===<br />
<!-- original chunk --><br />
ARC uses the Linux operating system. The program that responds to your typed commands and allows you to run other programs is called the Linux shell. There are several different shells available, but, by default you will use one called bash. It is useful to have some knowledge of the shell and a variety of other command-line programs that you can use to manipulate files. If you are new to Linux systems, we recommend that you work through one of the many online tutorials that are available, such as the [http://www.ee.surrey.ac.uk/Teaching/Unix/index.html UNIX Tutorial for Beginners (external link)] provided by the University of Surrey. The tutorial covers such fundamental topics, among others, as creating, renaming and deleting files and directories, how to produce a listing of your files and how to tell how much disk space you are using. For a more comprehensive introduction to Linux, see [http://linuxcommand.sourceforge.net/tlcl.php The Linux Command Line (external link)].<br />
<br />
The TALC login node may be used for such tasks as editing files, compiling programs and running short tests while developing programs. We suggest CPU intensive workloads on the login node be restricted to under 15 minutes as per [[General Cluster Guidelines and Policies|our cluster guidelines]]. For interactive workloads exceeding 15 minutes, use the '''[[Running_jobs#Interactive_jobs|salloc command]]''' to allocate an interactive session on a compute node.<br />
<br />
The default <code>salloc</code> allocation is 1 CPU and 1 GB of memory. Adjust this by specifying <code>-n CPU#</code> and <code>--mem Megabytes</code>. You may request up to 5 hours of CPU time for interactive jobs.<br />
salloc --time 5:00:00 --partition cpu24 <br />
<br />
<br />
<br />
===Running non-interactive jobs (batch processing)===<br />
Production runs and longer test runs should be submitted as (non-interactive) batch jobs, in which commands to be executed are listed in a script (text file). Batch jobs scripts are submitted using the <code>sbatch</code> command, part of the Slurm job management and scheduling software. #SBATCH directive lines at the beginning of the script are used to specify the resources needed for the job (cores, memory, run time limit and any specialized hardware needed).<br />
<br />
Most of the information on the Running Jobs page on the Compute Canada web site is also relevant for submitting and managing batch jobs and reserving processors for interactive work on TALC. One major difference between running jobs on the TALC and Compute Canada clusters is in selecting the type of hardware that should be used for a job. On TALC, you choose the hardware to use primarily by specifying a partition, as described below.<br />
<br />
===Using JupyterHub on Talc===<br />
Talc has a Jupyterhub server which runs a Jupyter server on one of the Talc compute nodes and provides all the necessary encryption and plumbing to deliver the notebook to your computer. To access this service you must have a Talc account. Point your browser at http://talc.ucalgary.ca and login with your usual UC account. As of this writing, the job that runs the jupyter notebook is 1 cpu and 10GiB of memory on a cpu24 node.<br />
<br />
===Selecting a partition===<br />
TALC currently has the following partitions available for use. The <code>gpu</code> and <code>cpu12</code> partitions are backed by the same nodes. The <code>cpu12</code> partition was created to only expose the CPUs on the GPU hardware for general purpose use. Each GPU node has 5 Tesla T4 GPUs installed, but you may only request one per job within the TALC environment.<br />
{| class="wikitable"<br />
!Partition<br />
!Description<br />
!Nodes<br />
!Cores<br />
!Memory <br />
!Memory Request Limit<br />
!Time Limit<br />
!GPU Request per Job<br />
!Network<br />
|-<br />
|gpu<br />
|GPU Compute<br />
|3<br />
|12 cores<br />
|192 GB<br />
|190 GB<br />
|24 hours<br />
|1x NVIDIA Corporation TU104GL [Tesla T4]<br />
|40 Gbit/s InfiniBand<br />
|-<br />
|cpu12<br />
|General Purpose Compute<br />
|3<br />
|12 cores<br />
|192 GB<br />
|190 GB<br />
|24 hours<br />
|None<br />
|40 Gbit/s InfiniBand<br />
|-<br />
|cpu24<br />
|General Purpose Compute<br />
|15<br />
|24 cores<br />
|256 GB<br />
|254 GB<br />
|24 hours<br />
|None<br />
|40 Gbit/s InfiniBand<br />
|-<br />
|bigmem<br />
|General Purpose Compute<br />
|2<br />
|32 cores<br />
|1024 GB<br />
|1022 GB<br />
|24 hours<br />
|None<br />
|40 Gbit/s InfiniBand<br />
|}<br />
There are some aspects to consider when selecting a partition including:<br />
* Resource requirements in terms of memory and CPU cores<br />
* Hardware specific requirements, such as GPU or CPU Instruction Set Extensions<br />
* Partition resource limits and potential wait time<br />
* Software support parallel processing using Message Passing Interface (MPI), OpenMP, etc.<br />
** Eg. MPI for parallel processing can distribute memory across multiple nodes, per-node memory requirements could be lower. Whereas, OpenMP or single process code that is restricted to one node would require a higher memory node.<br />
** Note: MPI code running on hardware with Omni-Path networking should be compiled with Omni-Path networking support. This is provided by loading the <code>openmpi/2.1.3-opa</code> or <code>openmpi/3.1.2-opa</code> modules prior to compiling.<br />
<br />
Since resources that are requested are reserved for your job, please request only as much CPU and memory as your job requires to avoid reducing the cluster efficiency. If you are unsure which partition to use or the specific resource requests that are appropriate for your jobs, please contact us at [mailto:support@hpc.ucalgary.ca support@hpc.ucalgary.ca] and we would be happy to work with you.<br />
<br />
=== Using a partition ===<br />
<br />
==== Bigmem and compute-only jobs ====<br />
To select the <code>cpu24</code> partition, include the following line in your batch job script:<syntaxhighlight lang="text"><br />
#SBATCH --partition=cpu24<br />
</syntaxhighlight>You may also start an interactive session with <code>salloc</code>:<syntaxhighlight lang="text"><br />
$ salloc --time 1:00:00 -p cpu24<br />
</syntaxhighlight><br />
<br />
==== GPU jobs ====<br />
In TALC, you are limited to exactly 1 GPU per job. Jobs that request for 0 GPUs or 2 or more GPUs will not be scheduled.<br />
<br />
To submit a job using the <code>gpu</code> partition with one GPU request, include the following to your batch job script:<syntaxhighlight lang="text"><br />
#SBATCH --partition=gpu<br />
#SBATCH --gpus-per-node=1<br />
</syntaxhighlight><br />
<br />
Like the previous example, you may also request interactive sessions with GPU nodes using <code>salloc</code>. Just specify the <code>gpu</code> partition and the number of GPUs required. <syntaxhighlight lang="text"><br />
$ salloc --time 1:00:00 -p gpu -n 1 --gpus-per-node 1 <br />
</syntaxhighlight>You may verify that a GPU was assigned to your job or interactive session by running <code>nvidia-smi</code>. This command will show you the status of the GPU that was assigned to you.<syntaxhighlight lang="text"><br />
$ nvidia-smi<br />
+-----------------------------------------------------------------------------+<br />
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |<br />
|-------------------------------+----------------------+----------------------+<br />
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |<br />
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |<br />
| | | MIG M. |<br />
|===============================+======================+======================|<br />
| 0 Tesla T4 Off | 00000000:3B:00.0 Off | 0 |<br />
| N/A 36C P0 14W / 70W | 0MiB / 15109MiB | 5% Default |<br />
| | | N/A |<br />
+-------------------------------+----------------------+----------------------+<br />
<br />
+-----------------------------------------------------------------------------+<br />
| Processes: |<br />
| GPU GI CI PID Type Process name GPU Memory |<br />
| ID ID Usage |<br />
|=============================================================================|<br />
| No running processes found |<br />
+-----------------------------------------------------------------------------+<br />
</syntaxhighlight><br />
<br />
==== Partition limitations ====<br />
In addition to the hardware limitations of the nodes within the partition, please be aware that there may also be policy limits imposed on your account for each partition. These limits restrict the number of cores, nodes, or GPUs that can be used at any given time. Since the limits are applied on a partition-by-partition basis, using resources in one partition should not affect the available resources you can use in another partition.<br />
<br />
These limits can be listed by running:<br />
<syntaxhighlight lang="bash"><br />
$ sacctmgr show qos format=Name,MaxWall,MaxTRESPU%20,MaxSubmitJobs<br />
Name MaxWall MaxTRESPU MaxSubmit<br />
---------- ----------- -------------------- ---------<br />
normal 1-00:00:00 mem=127000M <br />
cpu24 1-00:00:00 mem=127G <br />
bigmem 1-00:00:00 <br />
gpu gres/gpu=1 <br />
</syntaxhighlight><br />
<br />
=== Time limits ===<br />
Use the <code>--time</code> directive to tell the job scheduler the maximum time that your job might run. For example:<br />
#SBATCH --time=hh:mm:ss<br />
<br />
You can use <code>scontrol show partitions</code> or <code>sinfo</code> to see the current maximum time that a job can run.<br />
<syntaxhighlight lang="bash" highlight="6"><br />
$ scontrol show partitions<br />
PartitionName=single <br />
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL <br />
AllocNodes=ALL Default=NO QoS=single <br />
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO <br />
MaxNodes=UNLIMITED MaxTime=7-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED <br />
Nodes=cn[001-168] <br />
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO <br />
OverTimeLimit=NONE PreemptMode=OFF <br />
State=UP TotalCPUs=1344 TotalNodes=168 SelectTypeParameters=NONE <br />
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED <br />
</syntaxhighlight><br />
<br />
Alternatively, with <code>sinfo</code> under the <code>TIMELIMIT</code> column:<br />
<syntaxhighlight lang="bash"><br />
$ sinfo <br />
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST <br />
single up 7-00:00:00 1 drain* cn097 <br />
single up 7-00:00:00 1 maint cn002 <br />
single up 7-00:00:00 4 drain* cn[001,061,133,154] <br />
...<br />
</syntaxhighlight><br />
[[Category:TALC]]<br />
[[Category:Guides]]</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=Storage_Options&diff=1538Storage Options2021-10-21T18:40:19Z<p>Tthomas: /* OneDrive for Business */</p>
<hr />
<div>There are a few options researchers can take advantage of when storing their research data. <br />
<br />
== Data Classification ==<br />
Please review the different data classifications that are outlined by the [https://ucalgary.ca/legal-services/sites/default/files/teams/1/Standards-Legal-Information-Security-Classification-Standard.pdf Information Security Classification Standard]. There are 4 levels of data classification which are summarized in the table below.<br />
<br />
{| class="wikitable"<br />
! Level<br />
! Description<br />
! Example<br />
|-<br />
| Level 1<br />
| Public<br />
|<br />
* Reference data sets<br />
* Published research data<br />
|-<br />
| Level 2<br />
| Internal<br />
|<br />
* Internal memos<br />
* Unpublished research data<br />
* Anonymized or de-identified human subject data<br />
* Library transactions and journals<br />
|-<br />
| Level 3<br />
| Confidential<br />
|<br />
* Faculty/staff employment applications, personnel files, contact information<br />
* Donor or prospective donor information<br />
* Contracts<br />
* Intellectual property<br />
|-<br />
| Level 4<br />
| Restricted<br />
|<br />
* Patient identifiable health information<br />
* identifiable human subject research data<br />
* information subject to special government requirements<br />
|}<br />
<br />
When selecting a storage option, you must use one that meets or exceeds the rated security classification.<br />
<br />
* See also the Collaboration, storage and file shares article in Service Now:<br />
: https://ucalgary.service-now.com/it?id=it_catalog_by_category&sys_id=4dbb82ee13661200c524fc04e144b044<br />
<br />
== Research Data Management ==<br />
We recommend you follow good Research Data Management practices and ensure you have a DMP (Data Management Plan) created to guide your data's lifecycle. DMP Assistant has been created specifically for Canadian scholars and aims to meet any and all Tri-Agency requirements. See: https://assistant.portagenetwork.ca/<br />
<br />
Your DMP can help us support the FAIR (findable, accessible, interoperable and reusable) principles for data management.<br />
<br />
Please consider contacting Libraries and Cultural Resources for assistance. For guidance on general data management and developing a DMP, consult https://library.ucalgary.ca/guides/researchdatamanagement or contact research.data@ucalgary.ca.<br />
<br />
For support using PRISM Dataverse, UofC's institutional data repository, contact digitize@ucalgary.ca.<br />
<br />
If you need to share and preserve your large post-publication data set for a mandated period of time, please visit https://www.frdr-dfdr.ca/repo/ in order to learn more about the national Federated Research Data Repository. <br />
<br />
FRDR aligns with Tri-Agency Principles as a platform for Preservation, Retention and Sharing of research data. see: [http://www.science.gc.ca/eic/site/063.nsf/eng/h_83F7624E.html Tri-Agency Statement of Principles on Digital Data Management]<br />
<br />
<br />
== Secure Compute Data Storage (SCDS) ==<br />
Secure Computing Data Storage (SCDS) is a service provided by Research Computing Services that allows researchers to store restricted and confidential data. Collaboration with Level 4 data stored in SCDS is possible using ShareFile, a secure file sharing and collaboration tool by Citrix.<br />
<br />
{| class="wikitable"<br />
! Capacity<br />
| 10 GB or more<br />
|-<br />
! Classification<br />
| Level 4<br />
|-<br />
! Learn More<br />
| Visit [https://it.ucalgary.ca/secure-computing-platform The SCDS Website]<br />
|-<br />
! Request Access<br />
| Visit [https://ucalgary.service-now.com/kb_view.do?sysparm_article=KB0030163 ServiceNow to request access]<br />
|}<br />
<br />
== AcademicFS ==<br />
AcademicFS is a UofC hosted SMB/CIFS storage solution funded and operated by RCS. It is available by request to faculty and staff with active research data.<br />
{| class="wikitable"<br />
! Capacity<br />
| 100GB with quota increases available on request. <br />
|-<br />
! Classification<br />
| Level 1 - 2<br />
|-<br />
! Request Access<br />
| Visit [https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=fe66b3a7db297300897e4b8b0b96199d ServiceNow to request access]<br />
|}<br />
=== Service Description ===<br />
You may use AcademicFS to store your active research data files. AcademicFS is intended to be used as a research group or project share. AcademicFS is available on campus or off campus using the IT supported VPN client. Information on how to download and install the VPN client can be found here: https://ucalgary.service-now.com/it?id=kb_article&sys_id=880e71071381ae006f3afbb2e144b05c (IT account login may be required).<br />
All AcademicFS users must have a UofC IT account.<br />
<br />
=== Data recovery ===<br />
AcademicFS does daily snapshots at a bit past midnight, which it keeps for 30 days. You should be able to recover a deleted file for up to 30 days, if it was in your share overnight. If you create a file and delete it during a day, no snapshot will be available for you to recover. AcademicFS presents backups using the windows OS 'previous versions' functionality. If you are not familiar with using this, or if you are on a Linux or MacOS device, you can request a restore, with Service Now.<br />
<br />
For backup, we replicate changes to a distant data center every hour. The storage hardware which hosts your data is located in the basement of the Math Sciences building and our backup is in the HRIC building, so in case of an on campus disaster, your data should be safe.<br />
<br />
=== Support for AcademicFS ===<br />
If you have questions, please contact the IT Support Centre.<br />
: Mon – Fri: 8:30 am – 5:00 pm; Sat, Sun & holidays: 10:00 am – 2:00 pm.<br />
: Live Chat: ucalgary.ca/it<br />
: Email: itsupport@ucalgary.ca<br />
: Phone: 403.220.5555<br />
: In person: 773 Math Science<br />
<br />
<br />
<br />
== OneDrive for Business ==<br />
OneDrive for Business is a storage solution provided by Microsoft and is available by request to all faculty and staff.<br />
{| class="wikitable"<br />
! Capacity<br />
| 5 TB<br />
|-<br />
! Classification<br />
| Level 1 - 4<br />
|-<br />
! Request Access<br />
| Visit [https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=522b68ebdb83e700897e4b8b0b961997 ServiceNow to request access]<br />
|}<br />
<br />
You may use OneDrive for Business to store your personal and work related files. <br />
Files stored within OneDrive are by default private only to you but has the option to allow sharing and collaboration with others. <br />
OneDrive for Business cannot be used as a department or project share space.<br />
There is no group/lab offering with OneDrive. <br />
<br />
<br />
<br />
While OneDrive provides a secure/compliant location from an IT Security stand point, <br />
it’s not the most adequate location for data the PI is accountable for 5 years upon completion of the study. <br />
This is not a security issue, but a data management issue.<br />
<br />
For example, if a study was using a personal OneDrive of one of the researchers to store all the records, <br />
and the researcher was to leave the university, this OneDrive would be gone in 30 days.<br />
<br />
<br />
<br />
MS has an automation capability for their O365 products.<br />
If you have a windows OS machine, you can use the automation product ‘Flow’ to copy a file to a local file system when a new file is created on OneDrive.<br />
<br />
To back up data residing on ARC to your personal OneDrive allocation please see: [[How to transfer data#rclone: rsync for cloud storage]]<br />
<br />
OneDrive requires Multi-Factor Authentication (MFA) enabled on your University of Calgary IT account. <br />
<br />
<br />
UofC OneDrive data is reportedly hosted in Canada (Markham Ont).<br />
<br />
===Request Access===<br />
To request for OneDrive for Business:<br />
#Submit your request on ServiceNow using the OneDrive for Business request form (https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=522b68ebdb83e700897e4b8b0b961997)<br />
#The IT Support Centre will contact you<br />
#Set a time with IT Support Centre to turn on MFA<br />
# Turn on MFA<br />
#Turn on OneDrive for Business<br />
<br />
===Support for OneDrive for Business===<br />
If you have questions, please contact the IT Support Centre.<br />
: Mon – Fri: 8:30 am – 5:00 pm; Sat, Sun & holidays: 10:00 am – 2:00 pm.<br />
:Live Chat: ucalgary.ca/it<br />
:Email: itsupport@ucalgary.ca<br />
: Phone: 403.220.5555<br />
:In person: 773 Math Science<br />
<br />
===Data recovery===<br />
<br />
===Other Resources===<br />
For more information on OneDrive for Business:<br />
* Operating Level of Agreement KB0032404 (https://ucalgary.service-now.com/it?id=kb_article&sys_id=7f57bddcdb56a3047cab5068dc9619b6)<br />
*OneDrive for Business Getting Started KB0032351 (https://ucalgary.service-now.com/it?id=kb_article&sys_id=60994170db2da7487cab5068dc961900)<br />
*If you are above 90% of your OneDrive quota, you can request an increase here: ( https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=438e6d8313896a0053f2d7b2e144b0b9) PLEASE NOTE: Microsoft will only increase an allocation while the Cloud Storage is more than 90% full. Please log into your O365 cloud account to review before making your request.<br />
<br />
Any questions regarding if data hosted on OneDrive is subject to US jurisdiction discovery or access should be directed to:<br />
*https://cumming.ucalgary.ca/research-institutes/csm-research-services/legal-research-services (CSM researchers.)<br />
*https://research.ucalgary.ca/contact/research-services (Not CSM Researchers)<br />
*https://www.ucalgary.ca/legalservices/ (for teaching/learning – non research enquiries that make their way to you)<br />
<br />
==Office365 SharePoint for research groups==<br />
<br />
To be determined....<br />
<br />
Researchers will be able to request an Office 365 SharePoint site for a group at some point in the future <br />
which could be considered a group cloud sharing platform.<br />
<br />
==Personal storage options==<br />
For personal or level 1 data, you may use external solutions from WestGrid or Compute Canada.<br />
<br />
*'''WestGrid ownCloud''':<br />
:Information: https://www.westgrid.ca/resources_services/data_storage/cloud_storage<br />
:Youtube video introduction: https://www.youtube.com/watch?time_continue=6&v=szPNNySx_Hk&feature=emb_logo<br />
:Access portal: https://owncloud.westgrid.ca/<br />
*'''Compute Canada NextCloud''':<br />
:https://nextcloud.computecanada.ca<br />
<br />
== Home Directories and Research Group Allocations on ARC ==<br />
<br />
ARC storage is used to support workflows on the ARC computing cluster. The expectation is that storage on ARC will only be used for active and upcoming computational projects. It is not suitable for long-term or archival storage as it is not backed-up and is not guaranteed to be available for the time periods that are typical of archiving.<br />
<br />
=== Home Directories ===<br />
Every user account on ARC has a static 500GB allocation of storage and a maximum of 1.5 million files (including directories). This cannot be increased or decreased. Home directory storage is connected via a network file system to the rest of the cluster and supports fast data transfer to memory on compute nodes. This also means that basic file system commands (like <code>ls</code>, <code>find</code>, and <code>du</code>) take longer to run as the number of files in your home directory increases. In particular, we strongly encourage users to stay under 100000 files if it is at all possible. This can be achieved by combining smaller data files into single larger files, using structured data formats rather than large number of text files, or combining collections of files that will be used together into archives (tar, dar, etc). Since top level permissions on home directories are set to prevent other users from reading or executing, home directories are not suitable for sharing data directly with colleagues working on ARC. A Research Group Allocation is a more appropriate place for storing shared data or very large data sets that will be used as part of active computational projects. <br />
<br />
=== Research Group Allocations (<code>/work</code> and <code>/bulk</code>) ===<br />
The principal investigator (PI) for a research group may request an extended shared allocation for the research group by contacting support@hpc.ucalgary.ca with answers to the following questions (please copy the full text of the questions into your email and write answers under it):<br />
<br />
* How much storage is requested and why is that the amount that you need? <br />
A rationale for a request can be a formal data management plan or something more informal like a rough estimate to the primary dataset used for a project and a rough estimate to the size of outputs expected from your computations that are planned to run on ARC. <br />
<br />
Example 1: "We will be processing a 3T dataset consisting of 1000 experimental runs. Each experiment will be processed to produce a 6GB output and we will need some further space for post processing. We would like to request 12TB total." <br />
<br />
Example 2: "Our research group has 5 members with separate projects. 3 have projects that will use 1TB of data and 2 have projects that will require 3TB of data. We would like to request 10TB total."<br />
<br />
* What is the requested allocation name? (typically something like <PI name>_lab)<br />
* What is the data classification using the University of Calgary data security classification system?<br />
* Which user or users would be the owner of the allocation? (Full Name and UCalgary Email address, typically the requesting PI but there may be co-PIs)<br />
* Which members of the allocation should be able to request access for new users? (Full Name and UCalgary Email address for active ARC users) <br />
* What is the faculty of the owner or owners? <br />
* Please provide a short description of the lab or project that will use the allocation.<br />
<br />
Work and Bulk storage can be considerably larger than the home directory allocations. However, there are limits on what RCS can provide as ARC storage provides high speed access and is expensive to purchase. Typically, any request over 10TB will require some discussion. Work and Bulk allocations differ in a few ways that influence how they are used. Work storage is faster to access as part of computational jobs on ARC although the impact is small for jobs that don't involve enormous numbers of reads. Bulk storage is designed to be a target for instrument data (which is typically processed in a way that reads data a small number of times per job) and is capable of mounting instruments elsewhere on campus using SMB. A number of questions come up frequently about Work and Bulk storage and these are addressed in an [[Group Storage Allocation FAQ | FAQ]].<br />
<br />
<br />
[[Category:Guides]]</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=Storage_Options&diff=1515Storage Options2021-10-13T19:12:38Z<p>Tthomas: /* Other Resources */</p>
<hr />
<div>There are a few options researchers can take advantage of when storing their research data. <br />
<br />
== Data Classification ==<br />
Please review the different data classifications that are outlined by the [https://ucalgary.ca/legal-services/sites/default/files/teams/1/Standards-Legal-Information-Security-Classification-Standard.pdf Information Security Classification Standard]. There are 4 levels of data classification which are summarized in the table below.<br />
<br />
{| class="wikitable"<br />
! Level<br />
! Description<br />
! Example<br />
|-<br />
| Level 1<br />
| Public<br />
|<br />
* Reference data sets<br />
* Published research data<br />
|-<br />
| Level 2<br />
| Internal<br />
|<br />
* Internal memos<br />
* Unpublished research data<br />
* Anonymized or de-identified human subject data<br />
* Library transactions and journals<br />
|-<br />
| Level 3<br />
| Confidential<br />
|<br />
* Faculty/staff employment applications, personnel files, contact information<br />
* Donor or prospective donor information<br />
* Contracts<br />
* Intellectual property<br />
|-<br />
| Level 4<br />
| Restricted<br />
|<br />
* Patient identifiable health information<br />
* identifiable human subject research data<br />
* information subject to special government requirements<br />
|}<br />
<br />
When selecting a storage option, you must use one that meets or exceeds the rated security classification.<br />
<br />
* See also the Collaboration, storage and file shares article in Service Now:<br />
: https://ucalgary.service-now.com/it?id=it_catalog_by_category&sys_id=4dbb82ee13661200c524fc04e144b044<br />
<br />
== Research Data Management ==<br />
We recommend you follow good Research Data Management practices and ensure you have a DMP (Data Management Plan) created to guide your data's lifecycle. DMP Assistant has been created specifically for Canadian scholars and aims to meet any and all Tri-Agency requirements. See: https://assistant.portagenetwork.ca/<br />
<br />
Your DMP can help us support the FAIR (findable, accessible, interoperable and reusable) principles for data management.<br />
<br />
Please consider contacting Libraries and Cultural Resources for assistance. For guidance on general data management and developing a DMP, consult https://library.ucalgary.ca/guides/researchdatamanagement or contact research.data@ucalgary.ca.<br />
<br />
For support using PRISM Dataverse, UofC's institutional data repository, contact digitize@ucalgary.ca.<br />
<br />
If you need to share and preserve your large post-publication data set for a mandated period of time, please visit https://www.frdr-dfdr.ca/repo/ in order to learn more about the national Federated Research Data Repository. <br />
<br />
FRDR aligns with Tri-Agency Principles as a platform for Preservation, Retention and Sharing of research data. see: [http://www.science.gc.ca/eic/site/063.nsf/eng/h_83F7624E.html Tri-Agency Statement of Principles on Digital Data Management]<br />
<br />
<br />
== Secure Compute Data Storage (SCDS) ==<br />
Secure Computing Data Storage (SCDS) is a service provided by Research Computing Services that allows researchers to store restricted and confidential data. Collaboration with Level 4 data stored in SCDS is possible using ShareFile, a secure file sharing and collaboration tool by Citrix.<br />
<br />
{| class="wikitable"<br />
! Capacity<br />
| 10 GB or more<br />
|-<br />
! Classification<br />
| Level 4<br />
|-<br />
! Learn More<br />
| Visit [https://it.ucalgary.ca/secure-computing-platform The SCDS Website]<br />
|-<br />
! Request Access<br />
| Visit [https://ucalgary.service-now.com/kb_view.do?sysparm_article=KB0030163 ServiceNow to request access]<br />
|}<br />
<br />
== AcademicFS ==<br />
AcademicFS is a UofC hosted SMB/CIFS storage solution funded and operated by RCS. It is available by request to faculty and staff with active research data.<br />
{| class="wikitable"<br />
! Capacity<br />
| 100GB with quota increases available on request. <br />
|-<br />
! Classification<br />
| Level 1 - 2<br />
|-<br />
! Request Access<br />
| Visit [https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=fe66b3a7db297300897e4b8b0b96199d ServiceNow to request access]<br />
|}<br />
=== Service Description ===<br />
You may use AcademicFS to store your active research data files. AcademicFS is intended to be used as a research group or project share. AcademicFS is available on campus or off campus using the IT supported VPN client. Information on how to download and install the VPN client can be found here: https://ucalgary.service-now.com/it?id=kb_article&sys_id=880e71071381ae006f3afbb2e144b05c (IT account login may be required).<br />
All AcademicFS users must have a UofC IT account.<br />
<br />
=== Data recovery ===<br />
AcademicFS does daily snapshots at a bit past midnight, which it keeps for 30 days. You should be able to recover a deleted file for up to 30 days, if it was in your share overnight. If you create a file and delete it during a day, no snapshot will be available for you to recover. AcademicFS presents backups using the windows OS 'previous versions' functionality. If you are not familiar with using this, or if you are on a Linux or MacOS device, you can request a restore, with Service Now.<br />
<br />
For backup, we replicate changes to a distant data center every hour. The storage hardware which hosts your data is located in the basement of the Math Sciences building and our backup is in the HRIC building, so in case of an on campus disaster, your data should be safe.<br />
<br />
=== Support for AcademicFS ===<br />
If you have questions, please contact the IT Support Centre.<br />
: Mon – Fri: 8:30 am – 5:00 pm; Sat, Sun & holidays: 10:00 am – 2:00 pm.<br />
: Live Chat: ucalgary.ca/it<br />
: Email: itsupport@ucalgary.ca<br />
: Phone: 403.220.5555<br />
: In person: 773 Math Science<br />
<br />
<br />
<br />
== OneDrive for Business ==<br />
OneDrive for Business is a storage solution provided by Microsoft and is available by request to all faculty and staff.<br />
{| class="wikitable"<br />
! Capacity<br />
| 5 TB<br />
|-<br />
! Classification<br />
| Level 1 - 4<br />
|-<br />
! Request Access<br />
| Visit [https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=522b68ebdb83e700897e4b8b0b961997 ServiceNow to request access]<br />
|}<br />
<br />
You may use OneDrive for Business to store your personal and work related files. <br />
Files stored within OneDrive are by default private only to you but has the option to allow sharing and collaboration with others. <br />
OneDrive for Business cannot be used as a department or project share space.<br />
There is no group/lab offering with OneDrive. <br />
<br />
<br />
While OneDrive provides a secure/compliant location from an IT Security stand point, <br />
it’s not the most adequate location for data the PI is accountable for 5 years upon completion of the study. <br />
This is not a security issue, but a data management issue.<br />
OneDrive could be great for short term needs, <br />
but '''SCDS''' (see above) is the best option to store all research related electronic records for the long run. <br />
For example, if a study was using a personal OneDrive of one of the researchers to store all the records, <br />
and the researcher was to leave the university, this OneDrive would be gone in 30 days. <br />
That would never happen with SCDS where we are able to track discrete containers which remain associated to the PI and the REB project as one entity.<br />
<br />
<br />
MS has an automation capability for their O365 products.<br />
If you have a windows OS machine, you can use the automation product ‘Flow’ to copy a file to a local file system when a new file is created on OneDrive.<br />
<br />
To back up data residing on ARC to your personal OneDrive allocation please see: [[How to transfer data#rclone: rsync for cloud storage]]<br />
<br />
OneDrive requires Multi-Factor Authentication (MFA) enabled on your University of Calgary IT account. <br />
<br />
<br />
UofC OneDrive data is reportedly hosted in Canada (Markham Ont).<br />
<br />
===Request Access===<br />
To request for OneDrive for Business:<br />
#Submit your request on ServiceNow using the OneDrive for Business request form (https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=522b68ebdb83e700897e4b8b0b961997)<br />
#The IT Support Centre will contact you<br />
#Set a time with IT Support Centre to turn on MFA<br />
# Turn on MFA<br />
#Turn on OneDrive for Business<br />
<br />
===Support for OneDrive for Business===<br />
If you have questions, please contact the IT Support Centre.<br />
: Mon – Fri: 8:30 am – 5:00 pm; Sat, Sun & holidays: 10:00 am – 2:00 pm.<br />
:Live Chat: ucalgary.ca/it<br />
:Email: itsupport@ucalgary.ca<br />
: Phone: 403.220.5555<br />
:In person: 773 Math Science<br />
<br />
===Data recovery===<br />
<br />
===Other Resources===<br />
For more information on OneDrive for Business:<br />
* Operating Level of Agreement KB0032404 (https://ucalgary.service-now.com/it?id=kb_article&sys_id=7f57bddcdb56a3047cab5068dc9619b6)<br />
*OneDrive for Business Getting Started KB0032351 (https://ucalgary.service-now.com/it?id=kb_article&sys_id=60994170db2da7487cab5068dc961900)<br />
*If you are above 90% of your OneDrive quota, you can request an increase here: ( https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=438e6d8313896a0053f2d7b2e144b0b9) PLEASE NOTE: Microsoft will only increase an allocation while the Cloud Storage is more than 90% full. Please log into your O365 cloud account to review before making your request.<br />
<br />
Any questions regarding if data hosted on OneDrive is subject to US jurisdiction discovery or access should be directed to:<br />
*https://cumming.ucalgary.ca/research-institutes/csm-research-services/legal-research-services (CSM researchers.)<br />
*https://research.ucalgary.ca/contact/research-services (Not CSM Researchers)<br />
*https://www.ucalgary.ca/legalservices/ (for teaching/learning – non research enquiries that make their way to you)<br />
<br />
==Office365 SharePoint for research groups==<br />
<br />
To be determined....<br />
<br />
Researchers will be able to request an Office 365 SharePoint site for a group at some point in the future <br />
which could be considered a group cloud sharing platform.<br />
<br />
==Personal storage options==<br />
For personal or level 1 data, you may use external solutions from WestGrid or Compute Canada.<br />
<br />
*'''WestGrid ownCloud''':<br />
:Information: https://www.westgrid.ca/resources_services/data_storage/cloud_storage<br />
:Youtube video introduction: https://www.youtube.com/watch?time_continue=6&v=szPNNySx_Hk&feature=emb_logo<br />
:Access portal: https://owncloud.westgrid.ca/<br />
*'''Compute Canada NextCloud''':<br />
:https://nextcloud.computecanada.ca<br />
<br />
== Home Directories and Research Group Allocations on ARC ==<br />
<br />
ARC storage is used to support workflows on the ARC computing cluster. The expectation is that storage on ARC will only be used for active and upcoming computational projects. It is not suitable for long-term or archival storage as it is not backed-up and is not guaranteed to be available for the time periods that are typical of archiving.<br />
<br />
=== Home Directories ===<br />
Every user account on ARC has a static 500GB allocation of storage and a maximum of 1.5 million files (including directories). This cannot be increased or decreased. Home directory storage is connected via a network file system to the rest of the cluster and supports fast data transfer to memory on compute nodes. This also means that basic file system commands (like <code>ls</code>, <code>find</code>, and <code>du</code>) take longer to run as the number of files in your home directory increases. In particular, we strongly encourage users to stay under 100000 files if it is at all possible. This can be achieved by combining smaller data files into single larger files, using structured data formats rather than large number of text files, or combining collections of files that will be used together into archives (tar, dar, etc). Since top level permissions on home directories are set to prevent other users from reading or executing, home directories are not suitable for sharing data directly with colleagues working on ARC. A Research Group Allocation is a more appropriate place for storing shared data or very large data sets that will be used as part of active computational projects. <br />
<br />
=== Research Group Allocations (<code>/work</code> and <code>/bulk</code>) ===<br />
The principal investigator (PI) for a research group may request an extended shared allocation for the research group by contacting support@hpc.ucalgary.ca with answers to the following questions (please copy the full text of the questions into your email and write answers under it):<br />
<br />
* How much storage is requested and why is that the amount that you need? <br />
A rationale for a request can be a formal data management plan or something more informal like a rough estimate to the primary dataset used for a project and a rough estimate to the size of outputs expected from your computations that are planned to run on ARC. <br />
<br />
Example 1: "We will be processing a 3T dataset consisting of 1000 experimental runs. Each experiment will be processed to produce a 6GB output and we will need some further space for post processing. We would like to request 12TB total." <br />
<br />
Example 2: "Our research group has 5 members with separate projects. 3 have projects that will use 1TB of data and 2 have projects that will require 3TB of data. We would like to request 10TB total."<br />
<br />
* What is the requested allocation name? (typically something like <PI name>_lab)<br />
* What is the data classification using the University of Calgary data security classification system?<br />
* Which user or users would be the owner of the allocation? (Full Name and UCalgary Email address, typically the requesting PI but there may be co-PIs)<br />
* Which members of the allocation should be able to request access for new users? (Full Name and UCalgary Email address for active ARC users) <br />
* What is the faculty of the owner or owners? <br />
* Please provide a short description of the lab or project that will use the allocation.<br />
<br />
Work and Bulk storage can be considerably larger than the home directory allocations. However, there are limits on what RCS can provide as ARC storage provides high speed access and is expensive to purchase. Typically, any request over 10TB will require some discussion. Work and Bulk allocations differ in a few ways that influence how they are used. Work storage is faster to access as part of computational jobs on ARC although the impact is small for jobs that don't involve enormous numbers of reads. Bulk storage is designed to be a target for instrument data (which is typically processed in a way that reads data a small number of times per job) and is capable of mounting instruments elsewhere on campus using SMB. A number of questions come up frequently about Work and Bulk storage and these are addressed in an [[Group Storage Allocation FAQ | FAQ]].<br />
<br />
<br />
[[Category:Guides]]</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=Storage_Options&diff=1514Storage Options2021-10-13T19:12:12Z<p>Tthomas: /* OneDrive for Business */</p>
<hr />
<div>There are a few options researchers can take advantage of when storing their research data. <br />
<br />
== Data Classification ==<br />
Please review the different data classifications that are outlined by the [https://ucalgary.ca/legal-services/sites/default/files/teams/1/Standards-Legal-Information-Security-Classification-Standard.pdf Information Security Classification Standard]. There are 4 levels of data classification which are summarized in the table below.<br />
<br />
{| class="wikitable"<br />
! Level<br />
! Description<br />
! Example<br />
|-<br />
| Level 1<br />
| Public<br />
|<br />
* Reference data sets<br />
* Published research data<br />
|-<br />
| Level 2<br />
| Internal<br />
|<br />
* Internal memos<br />
* Unpublished research data<br />
* Anonymized or de-identified human subject data<br />
* Library transactions and journals<br />
|-<br />
| Level 3<br />
| Confidential<br />
|<br />
* Faculty/staff employment applications, personnel files, contact information<br />
* Donor or prospective donor information<br />
* Contracts<br />
* Intellectual property<br />
|-<br />
| Level 4<br />
| Restricted<br />
|<br />
* Patient identifiable health information<br />
* identifiable human subject research data<br />
* information subject to special government requirements<br />
|}<br />
<br />
When selecting a storage option, you must use one that meets or exceeds the rated security classification.<br />
<br />
* See also the Collaboration, storage and file shares article in Service Now:<br />
: https://ucalgary.service-now.com/it?id=it_catalog_by_category&sys_id=4dbb82ee13661200c524fc04e144b044<br />
<br />
== Research Data Management ==<br />
We recommend you follow good Research Data Management practices and ensure you have a DMP (Data Management Plan) created to guide your data's lifecycle. DMP Assistant has been created specifically for Canadian scholars and aims to meet any and all Tri-Agency requirements. See: https://assistant.portagenetwork.ca/<br />
<br />
Your DMP can help us support the FAIR (findable, accessible, interoperable and reusable) principles for data management.<br />
<br />
Please consider contacting Libraries and Cultural Resources for assistance. For guidance on general data management and developing a DMP, consult https://library.ucalgary.ca/guides/researchdatamanagement or contact research.data@ucalgary.ca.<br />
<br />
For support using PRISM Dataverse, UofC's institutional data repository, contact digitize@ucalgary.ca.<br />
<br />
If you need to share and preserve your large post-publication data set for a mandated period of time, please visit https://www.frdr-dfdr.ca/repo/ in order to learn more about the national Federated Research Data Repository. <br />
<br />
FRDR aligns with Tri-Agency Principles as a platform for Preservation, Retention and Sharing of research data. see: [http://www.science.gc.ca/eic/site/063.nsf/eng/h_83F7624E.html Tri-Agency Statement of Principles on Digital Data Management]<br />
<br />
<br />
== Secure Compute Data Storage (SCDS) ==<br />
Secure Computing Data Storage (SCDS) is a service provided by Research Computing Services that allows researchers to store restricted and confidential data. Collaboration with Level 4 data stored in SCDS is possible using ShareFile, a secure file sharing and collaboration tool by Citrix.<br />
<br />
{| class="wikitable"<br />
! Capacity<br />
| 10 GB or more<br />
|-<br />
! Classification<br />
| Level 4<br />
|-<br />
! Learn More<br />
| Visit [https://it.ucalgary.ca/secure-computing-platform The SCDS Website]<br />
|-<br />
! Request Access<br />
| Visit [https://ucalgary.service-now.com/kb_view.do?sysparm_article=KB0030163 ServiceNow to request access]<br />
|}<br />
<br />
== AcademicFS ==<br />
AcademicFS is a UofC hosted SMB/CIFS storage solution funded and operated by RCS. It is available by request to faculty and staff with active research data.<br />
{| class="wikitable"<br />
! Capacity<br />
| 100GB with quota increases available on request. <br />
|-<br />
! Classification<br />
| Level 1 - 2<br />
|-<br />
! Request Access<br />
| Visit [https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=fe66b3a7db297300897e4b8b0b96199d ServiceNow to request access]<br />
|}<br />
=== Service Description ===<br />
You may use AcademicFS to store your active research data files. AcademicFS is intended to be used as a research group or project share. AcademicFS is available on campus or off campus using the IT supported VPN client. Information on how to download and install the VPN client can be found here: https://ucalgary.service-now.com/it?id=kb_article&sys_id=880e71071381ae006f3afbb2e144b05c (IT account login may be required).<br />
All AcademicFS users must have a UofC IT account.<br />
<br />
=== Data recovery ===<br />
AcademicFS does daily snapshots at a bit past midnight, which it keeps for 30 days. You should be able to recover a deleted file for up to 30 days, if it was in your share overnight. If you create a file and delete it during a day, no snapshot will be available for you to recover. AcademicFS presents backups using the windows OS 'previous versions' functionality. If you are not familiar with using this, or if you are on a Linux or MacOS device, you can request a restore, with Service Now.<br />
<br />
For backup, we replicate changes to a distant data center every hour. The storage hardware which hosts your data is located in the basement of the Math Sciences building and our backup is in the HRIC building, so in case of an on campus disaster, your data should be safe.<br />
<br />
=== Support for AcademicFS ===<br />
If you have questions, please contact the IT Support Centre.<br />
: Mon – Fri: 8:30 am – 5:00 pm; Sat, Sun & holidays: 10:00 am – 2:00 pm.<br />
: Live Chat: ucalgary.ca/it<br />
: Email: itsupport@ucalgary.ca<br />
: Phone: 403.220.5555<br />
: In person: 773 Math Science<br />
<br />
<br />
<br />
== OneDrive for Business ==<br />
OneDrive for Business is a storage solution provided by Microsoft and is available by request to all faculty and staff.<br />
{| class="wikitable"<br />
! Capacity<br />
| 5 TB<br />
|-<br />
! Classification<br />
| Level 1 - 4<br />
|-<br />
! Request Access<br />
| Visit [https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=522b68ebdb83e700897e4b8b0b961997 ServiceNow to request access]<br />
|}<br />
<br />
You may use OneDrive for Business to store your personal and work related files. <br />
Files stored within OneDrive are by default private only to you but has the option to allow sharing and collaboration with others. <br />
OneDrive for Business cannot be used as a department or project share space.<br />
There is no group/lab offering with OneDrive. <br />
<br />
<br />
While OneDrive provides a secure/compliant location from an IT Security stand point, <br />
it’s not the most adequate location for data the PI is accountable for 5 years upon completion of the study. <br />
This is not a security issue, but a data management issue.<br />
OneDrive could be great for short term needs, <br />
but '''SCDS''' (see above) is the best option to store all research related electronic records for the long run. <br />
For example, if a study was using a personal OneDrive of one of the researchers to store all the records, <br />
and the researcher was to leave the university, this OneDrive would be gone in 30 days. <br />
That would never happen with SCDS where we are able to track discrete containers which remain associated to the PI and the REB project as one entity.<br />
<br />
<br />
MS has an automation capability for their O365 products.<br />
If you have a windows OS machine, you can use the automation product ‘Flow’ to copy a file to a local file system when a new file is created on OneDrive.<br />
<br />
To back up data residing on ARC to your personal OneDrive allocation please see: [[How to transfer data#rclone: rsync for cloud storage]]<br />
<br />
OneDrive requires Multi-Factor Authentication (MFA) enabled on your University of Calgary IT account. <br />
<br />
<br />
UofC OneDrive data is reportedly hosted in Canada (Markham Ont).<br />
<br />
===Request Access===<br />
To request for OneDrive for Business:<br />
#Submit your request on ServiceNow using the OneDrive for Business request form (https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=522b68ebdb83e700897e4b8b0b961997)<br />
#The IT Support Centre will contact you<br />
#Set a time with IT Support Centre to turn on MFA<br />
# Turn on MFA<br />
#Turn on OneDrive for Business<br />
<br />
===Support for OneDrive for Business===<br />
If you have questions, please contact the IT Support Centre.<br />
: Mon – Fri: 8:30 am – 5:00 pm; Sat, Sun & holidays: 10:00 am – 2:00 pm.<br />
:Live Chat: ucalgary.ca/it<br />
:Email: itsupport@ucalgary.ca<br />
: Phone: 403.220.5555<br />
:In person: 773 Math Science<br />
<br />
===Data recovery===<br />
<br />
===Other Resources===<br />
For more information on OneDrive for Business:<br />
* Operating Level of Agreement KB0032404 (https://ucalgary.service-now.com/it?id=kb_article&sys_id=7f57bddcdb56a3047cab5068dc9619b6)<br />
*OneDrive for Business Getting Started KB0032351 (https://ucalgary.service-now.com/it?id=kb_article&sys_id=60994170db2da7487cab5068dc961900)<br />
*If you are at or above 90% of your OneDrive quota, you can request an increase here: ( https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=438e6d8313896a0053f2d7b2e144b0b9) PLEASE NOTE: Microsoft will only increase an allocation while the Cloud Storage is more than 90% full. Please log into your O365 cloud account to review before making your request.<br />
<br />
Any questions regarding if data hosted on OneDrive is subject to US jurisdiction discovery or access should be directed to:<br />
*https://cumming.ucalgary.ca/research-institutes/csm-research-services/legal-research-services (CSM researchers.)<br />
*https://research.ucalgary.ca/contact/research-services (Not CSM Researchers)<br />
*https://www.ucalgary.ca/legalservices/ (for teaching/learning – non research enquiries that make their way to you)<br />
<br />
==Office365 SharePoint for research groups==<br />
<br />
To be determined....<br />
<br />
Researchers will be able to request an Office 365 SharePoint site for a group at some point in the future <br />
which could be considered a group cloud sharing platform.<br />
<br />
==Personal storage options==<br />
For personal or level 1 data, you may use external solutions from WestGrid or Compute Canada.<br />
<br />
*'''WestGrid ownCloud''':<br />
:Information: https://www.westgrid.ca/resources_services/data_storage/cloud_storage<br />
:Youtube video introduction: https://www.youtube.com/watch?time_continue=6&v=szPNNySx_Hk&feature=emb_logo<br />
:Access portal: https://owncloud.westgrid.ca/<br />
*'''Compute Canada NextCloud''':<br />
:https://nextcloud.computecanada.ca<br />
<br />
== Home Directories and Research Group Allocations on ARC ==<br />
<br />
ARC storage is used to support workflows on the ARC computing cluster. The expectation is that storage on ARC will only be used for active and upcoming computational projects. It is not suitable for long-term or archival storage as it is not backed-up and is not guaranteed to be available for the time periods that are typical of archiving.<br />
<br />
=== Home Directories ===<br />
Every user account on ARC has a static 500GB allocation of storage and a maximum of 1.5 million files (including directories). This cannot be increased or decreased. Home directory storage is connected via a network file system to the rest of the cluster and supports fast data transfer to memory on compute nodes. This also means that basic file system commands (like <code>ls</code>, <code>find</code>, and <code>du</code>) take longer to run as the number of files in your home directory increases. In particular, we strongly encourage users to stay under 100000 files if it is at all possible. This can be achieved by combining smaller data files into single larger files, using structured data formats rather than large number of text files, or combining collections of files that will be used together into archives (tar, dar, etc). Since top level permissions on home directories are set to prevent other users from reading or executing, home directories are not suitable for sharing data directly with colleagues working on ARC. A Research Group Allocation is a more appropriate place for storing shared data or very large data sets that will be used as part of active computational projects. <br />
<br />
=== Research Group Allocations (<code>/work</code> and <code>/bulk</code>) ===<br />
The principal investigator (PI) for a research group may request an extended shared allocation for the research group by contacting support@hpc.ucalgary.ca with answers to the following questions (please copy the full text of the questions into your email and write answers under it):<br />
<br />
* How much storage is requested and why is that the amount that you need? <br />
A rationale for a request can be a formal data management plan or something more informal like a rough estimate to the primary dataset used for a project and a rough estimate to the size of outputs expected from your computations that are planned to run on ARC. <br />
<br />
Example 1: "We will be processing a 3T dataset consisting of 1000 experimental runs. Each experiment will be processed to produce a 6GB output and we will need some further space for post processing. We would like to request 12TB total." <br />
<br />
Example 2: "Our research group has 5 members with separate projects. 3 have projects that will use 1TB of data and 2 have projects that will require 3TB of data. We would like to request 10TB total."<br />
<br />
* What is the requested allocation name? (typically something like <PI name>_lab)<br />
* What is the data classification using the University of Calgary data security classification system?<br />
* Which user or users would be the owner of the allocation? (Full Name and UCalgary Email address, typically the requesting PI but there may be co-PIs)<br />
* Which members of the allocation should be able to request access for new users? (Full Name and UCalgary Email address for active ARC users) <br />
* What is the faculty of the owner or owners? <br />
* Please provide a short description of the lab or project that will use the allocation.<br />
<br />
Work and Bulk storage can be considerably larger than the home directory allocations. However, there are limits on what RCS can provide as ARC storage provides high speed access and is expensive to purchase. Typically, any request over 10TB will require some discussion. Work and Bulk allocations differ in a few ways that influence how they are used. Work storage is faster to access as part of computational jobs on ARC although the impact is small for jobs that don't involve enormous numbers of reads. Bulk storage is designed to be a target for instrument data (which is typically processed in a way that reads data a small number of times per job) and is capable of mounting instruments elsewhere on campus using SMB. A number of questions come up frequently about Work and Bulk storage and these are addressed in an [[Group Storage Allocation FAQ | FAQ]].<br />
<br />
<br />
[[Category:Guides]]</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=Storage_Options&diff=1512Storage Options2021-10-13T18:59:27Z<p>Tthomas: /* Data recovery */</p>
<hr />
<div>There are a few options researchers can take advantage of when storing their research data. <br />
<br />
== Data Classification ==<br />
Please review the different data classifications that are outlined by the [https://ucalgary.ca/legal-services/sites/default/files/teams/1/Standards-Legal-Information-Security-Classification-Standard.pdf Information Security Classification Standard]. There are 4 levels of data classification which are summarized in the table below.<br />
<br />
{| class="wikitable"<br />
! Level<br />
! Description<br />
! Example<br />
|-<br />
| Level 1<br />
| Public<br />
|<br />
* Reference data sets<br />
* Published research data<br />
|-<br />
| Level 2<br />
| Internal<br />
|<br />
* Internal memos<br />
* Unpublished research data<br />
* Anonymized or de-identified human subject data<br />
* Library transactions and journals<br />
|-<br />
| Level 3<br />
| Confidential<br />
|<br />
* Faculty/staff employment applications, personnel files, contact information<br />
* Donor or prospective donor information<br />
* Contracts<br />
* Intellectual property<br />
|-<br />
| Level 4<br />
| Restricted<br />
|<br />
* Patient identifiable health information<br />
* identifiable human subject research data<br />
* information subject to special government requirements<br />
|}<br />
<br />
When selecting a storage option, you must use one that meets or exceeds the rated security classification.<br />
<br />
* See also the Collaboration, storage and file shares article in Service Now:<br />
: https://ucalgary.service-now.com/it?id=it_catalog_by_category&sys_id=4dbb82ee13661200c524fc04e144b044<br />
<br />
== Research Data Management ==<br />
We recommend you follow good Research Data Management practices and ensure you have a DMP (Data Management Plan) created to guide your data's lifecycle. DMP Assistant has been created specifically for Canadian scholars and aims to meet any and all Tri-Agency requirements. See: https://assistant.portagenetwork.ca/<br />
<br />
Your DMP can help us support the FAIR (findable, accessible, interoperable and reusable) principles for data management.<br />
<br />
Please consider contacting Libraries and Cultural Resources for assistance. For guidance on general data management and developing a DMP, consult https://library.ucalgary.ca/guides/researchdatamanagement or contact research.data@ucalgary.ca.<br />
<br />
For support using PRISM Dataverse, UofC's institutional data repository, contact digitize@ucalgary.ca.<br />
<br />
If you need to share and preserve your large post-publication data set for a mandated period of time, please visit https://www.frdr-dfdr.ca/repo/ in order to learn more about the national Federated Research Data Repository. <br />
<br />
FRDR aligns with Tri-Agency Principles as a platform for Preservation, Retention and Sharing of research data. see: [http://www.science.gc.ca/eic/site/063.nsf/eng/h_83F7624E.html Tri-Agency Statement of Principles on Digital Data Management]<br />
<br />
== Home Directories and Research Group Allocations on ARC ==<br />
<br />
ARC storage is used to support workflows on the ARC computing cluster. The expectation is that storage on ARC will only be used for active and upcoming computational projects. It is not suitable for long-term or archival storage as it is not backed-up and is not guaranteed to be available for the time periods that are typical of archiving.<br />
<br />
=== Home Directories ===<br />
Every user account on ARC has a static 500GB allocation of storage and a maximum of 1.5 million files (including directories). This cannot be increased or decreased. Home directory storage is connected via a network file system to the rest of the cluster and supports fast data transfer to memory on compute nodes. This also means that basic file system commands (like <code>ls</code>, <code>find</code>, and <code>du</code>) take longer to run as the number of files in your home directory increases. In particular, we strongly encourage users to stay under 100000 files if it is at all possible. This can be achieved by combining smaller data files into single larger files, using structured data formats rather than large number of text files, or combining collections of files that will be used together into archives (tar, dar, etc). Since top level permissions on home directories are set to prevent other users from reading or executing, home directories are not suitable for sharing data directly with colleagues working on ARC. A Research Group Allocation is a more appropriate place for storing shared data or very large data sets that will be used as part of active computational projects. <br />
<br />
=== Research Group Allocations (<code>/work</code> and <code>/bulk</code>) ===<br />
The principal investigator (PI) for a research group may request an extended shared allocation for the research group by contacting support@hpc.ucalgary.ca with answers to the following questions (please copy the full text of the questions into your email and write answers under it):<br />
<br />
* How much storage is requested and why is that the amount that you need? <br />
A rationale for a request can be a formal data management plan or something more informal like a rough estimate to the primary dataset used for a project and a rough estimate to the size of outputs expected from your computations that are planned to run on ARC. <br />
<br />
Example 1: "We will be processing a 3T dataset consisting of 1000 experimental runs. Each experiment will be processed to produce a 6GB output and we will need some further space for post processing. We would like to request 12TB total." <br />
<br />
Example 2: "Our research group has 5 members with separate projects. 3 have projects that will use 1TB of data and 2 have projects that will require 3TB of data. We would like to request 10TB total."<br />
<br />
* What is the requested allocation name? (typically something like <PI name>_lab)<br />
* What is the data classification using the University of Calgary data security classification system?<br />
* Which user or users would be the owner of the allocation? (Full Name and UCalgary Email address, typically the requesting PI but there may be co-PIs)<br />
* Which members of the allocation should be able to request access for new users? (Full Name and UCalgary Email address for active ARC users) <br />
* What is the faculty of the owner or owners? <br />
* Please provide a short description of the lab or project that will use the allocation.<br />
<br />
Work and Bulk storage can be considerably larger than the home directory allocations. However, there are limits on what RCS can provide as ARC storage provides high speed access and is expensive to purchase. Typically, any request over 10TB will require some discussion. Work and Bulk allocations differ in a few ways that influence how they are used. Work storage is faster to access as part of computational jobs on ARC although the impact is small for jobs that don't involve enormous numbers of reads. Bulk storage is designed to be a target for instrument data (which is typically processed in a way that reads data a small number of times per job) and is capable of mounting instruments elsewhere on campus using SMB. A number of questions come up frequently about Work and Bulk storage and these are addressed in an [[Group Storage Allocation FAQ | FAQ]].<br />
<br />
== Secure Compute Data Storage (SCDS) ==<br />
Secure Computing Data Storage (SCDS) is a service provided by Research Computing Services that allows researchers to store restricted and confidential data. Collaboration with Level 4 data stored in SCDS is possible using ShareFile, a secure file sharing and collaboration tool by Citrix.<br />
<br />
{| class="wikitable"<br />
! Capacity<br />
| 10 GB or more<br />
|-<br />
! Classification<br />
| Level 4<br />
|-<br />
! Learn More<br />
| Visit [https://it.ucalgary.ca/secure-computing-platform The SCDS Website]<br />
|-<br />
! Request Access<br />
| Visit [https://ucalgary.service-now.com/kb_view.do?sysparm_article=KB0030163 ServiceNow to request access]<br />
|}<br />
<br />
== AcademicFS ==<br />
AcademicFS is a UofC hosted SMB/CIFS storage solution funded and operated by RCS. It is available by request to faculty and staff with active research data.<br />
{| class="wikitable"<br />
! Capacity<br />
| 100GB with quota increases available on request. <br />
|-<br />
! Classification<br />
| Level 1 - 2<br />
|-<br />
! Request Access<br />
| Visit [https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=fe66b3a7db297300897e4b8b0b96199d ServiceNow to request access]<br />
|}<br />
=== Service Description ===<br />
You may use AcademicFS to store your active research data files. AcademicFS is intended to be used as a research group or project share. AcademicFS is available on campus or off campus using the IT supported VPN client. Information on how to download and install the VPN client can be found here: https://ucalgary.service-now.com/it?id=kb_article&sys_id=880e71071381ae006f3afbb2e144b05c (IT account login may be required).<br />
All AcademicFS users must have a UofC IT account.<br />
<br />
=== Data recovery ===<br />
AcademicFS does daily snapshots at a bit past midnight, which it keeps for 30 days. You should be able to recover a deleted file for up to 30 days, if it was in your share overnight. If you create a file and delete it during a day, no snapshot will be available for you to recover. AcademicFS presents backups using the windows OS 'previous versions' functionality. If you are not familiar with using this, or if you are on a Linux or MacOS device, you can request a restore, with Service Now.<br />
<br />
For backup, we replicate changes to a distant data center every hour. The storage hardware which hosts your data is located in the basement of the Math Sciences building and our backup is in the HRIC building, so in case of an on campus disaster, your data should be safe.<br />
<br />
=== Support for AcademicFS ===<br />
If you have questions, please contact the IT Support Centre.<br />
: Mon – Fri: 8:30 am – 5:00 pm; Sat, Sun & holidays: 10:00 am – 2:00 pm.<br />
: Live Chat: ucalgary.ca/it<br />
: Email: itsupport@ucalgary.ca<br />
: Phone: 403.220.5555<br />
: In person: 773 Math Science<br />
<br />
<br />
<br />
== OneDrive for Business ==<br />
OneDrive for Business is a storage solution provided by Microsoft and is available by request to all faculty and staff.<br />
{| class="wikitable"<br />
! Capacity<br />
| 5 TB<br />
|-<br />
! Classification<br />
| Level 1 - 4<br />
|-<br />
! Request Access<br />
| Visit [https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=522b68ebdb83e700897e4b8b0b961997 ServiceNow to request access]<br />
|}<br />
<br />
You may use OneDrive for Business to store your personal and work related files. <br />
Files stored within OneDrive are by default private only to you but has the option to allow sharing and collaboration with others. <br />
OneDrive for Business cannot be used as a department or project share space.<br />
There is no group/lab offering with OneDrive. <br />
<br />
<br />
While OneDrive provides a secure/compliant location from an IT Security stand point, <br />
it’s not the most adequate location for data the PI is accountable for 5 years upon completion of the study. <br />
This is not a security issue, but a data management issue.<br />
OneDrive could be great for short term needs, <br />
but '''SCDS''' (see above) is the best option to store all research related electronic records for the long run. <br />
For example, if a study was using a personal OneDrive of one of the researchers to store all the records, <br />
and the researcher was to leave the university, this OneDrive would be gone in 30 days. <br />
That would never happen with SCDS where we are able to track discrete containers which remain associated to the PI and the REB project as one entity.<br />
<br />
<br />
MS has an automation capability for their O365 products.<br />
If you have a windows OS machine, you can use the automation product ‘Flow’ to copy a file to a local file system when a new file is created on OneDrive.<br />
<br />
To back up data residing on ARC to your personal OneDrive allocation please see: [[How to transfer data#rclone: rsync for cloud storage]]<br />
<br />
OneDrive requires Multi-Factor Authentication (MFA) enabled on your University of Calgary IT account. <br />
<br />
<br />
UofC OneDrive data is reportedly hosted in Canada (Markham Ont).<br />
<br />
===Request Access===<br />
To request for OneDrive for Business:<br />
#Submit your request on ServiceNow using the OneDrive for Business request form (https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=522b68ebdb83e700897e4b8b0b961997)<br />
#The IT Support Centre will contact you<br />
#Set a time with IT Support Centre to turn on MFA<br />
# Turn on MFA<br />
#Turn on OneDrive for Business<br />
<br />
===Support for OneDrive for Business===<br />
If you have questions, please contact the IT Support Centre.<br />
: Mon – Fri: 8:30 am – 5:00 pm; Sat, Sun & holidays: 10:00 am – 2:00 pm.<br />
:Live Chat: ucalgary.ca/it<br />
:Email: itsupport@ucalgary.ca<br />
: Phone: 403.220.5555<br />
:In person: 773 Math Science<br />
<br />
===Data recovery===<br />
<br />
===Other Resources===<br />
For more information on OneDrive for Business:<br />
* Operating Level of Agreement KB0032404 (https://ucalgary.service-now.com/it?id=kb_article&sys_id=7f57bddcdb56a3047cab5068dc9619b6)<br />
*OneDrive for Business Getting Started KB0032351 (https://ucalgary.service-now.com/it?id=kb_article&sys_id=60994170db2da7487cab5068dc961900)<br />
<br />
Any questions regarding if data hosted on OneDrive is subject to US jurisdiction discovery or access should be directed to:<br />
*https://cumming.ucalgary.ca/research-institutes/csm-research-services/legal-research-services (CSM researchers.)<br />
*https://research.ucalgary.ca/contact/research-services (Not CSM Researchers)<br />
*https://www.ucalgary.ca/legalservices/ (for teaching/learning – non research enquiries that make their way to you)<br />
<br />
==Office365 SharePoint for research groups==<br />
<br />
To be determined....<br />
<br />
Researchers will be able to request an Office 365 SharePoint site for a group at some point in the future <br />
which could be considered a group cloud sharing platform.<br />
<br />
==Personal storage options==<br />
For personal or level 1 data, you may use external solutions from WestGrid or Compute Canada.<br />
<br />
*'''WestGrid ownCloud''':<br />
:Information: https://www.westgrid.ca/resources_services/data_storage/cloud_storage<br />
:Youtube video introduction: https://www.youtube.com/watch?time_continue=6&v=szPNNySx_Hk&feature=emb_logo<br />
:Access portal: https://owncloud.westgrid.ca/<br />
*'''Compute Canada NextCloud''':<br />
:https://nextcloud.computecanada.ca<br />
<br />
[[Category:Guides]]</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=ARC_Storage_Terms_of_Use&diff=1442ARC Storage Terms of Use2021-07-09T14:42:59Z<p>Tthomas: </p>
<hr />
<div>There are several options for disk storage on the ARC Cluster. Please review this section carefully to decide where to place your data. Contact systems staff at support@hpc.ucalgary.ca if you have any questions.<br />
As this is a limited resource, please use the space responsibly. Disk space on the ARC Cluster should not be used as archival storage, nor should it be used as a backup (2nd copy location) for other systems (e.g. desktops, laptops, etc.).<br />
<br />
== Summary of File Storage Options ==<br />
{| class=wikitable<br />
! File System !! Type !! Snapshots !! Backups !! Quota<br />
|-<br />
! /home <br />
| NetApp<br />
FAS8200, NFS <br />
| 7 days <br />
| No <br />
| 500 GB/user<br />
|-<br />
! /scratch <br />
| NetApp<br />
FAS8200, NFS <br />
| None <br />
| No <br />
| 15 TB/user<br />
|-<br />
! /work <br />
| NetApp<br />
FAS8200, NFS <br />
| 7 days <br />
| No <br />
| By request/group<br />
|-<br />
! /bulk <br />
| NetApp<br />
FAS2720, NFS <br />
| 7 days <br />
| No <br />
| By request/group<br />
|-<br />
! /tmp <br />
| Local <br />
| None <br />
| No <br />
| 100 GB<br />
|}<br />
<br />
Currently all users are responsible for managing their own backups. You can back up data to your personal UofC OneDrive for business cloud storage. see: [[How to transfer data#rclone:%20rsync%20for%20cloud%20storage|https://rcs.ucalgary.ca/How_to_transfer_data#rclone:_rsync_for_cloud_storage]] This allocation starts at 5TB. Contact the support center for questions regarding OneDrive for Business. <br />
<br />
== Performance ==<br />
{| class=wikitable<br />
! Bulk<br />
! Work<br />
|-<br />
|<br />
* Single node streaming R/W 540/730 MB/s<br />
* Aggregate (multi-node) R/W 4140/4400 MB/s<br />
* Untar Linux kernel (~39000 files): 3:09 (m:ss)<br />
|<br />
* Single node streaming R/W 620/750 MB/s<br />
* Aggregate (multi-node) R/W 6131/5616 MB/s<br />
* Untar Linux kernel (~39000 files): 2:30 (m:ss)<br />
|}<br />
<br />
<br />
===/home===<br />
Each user has a home directory called /home/username. The /home file system has a quota of 500 GB per user which cannot be increased, and seven days of snapshots.<br />
===/scratch===<br />
/scratch is intended for temporary data for the duration of the job.<br />
<br />
Directories in /scratch are created when a user job starts. The naming of the directory is /scratch/JOBID, where JOBID is the job id that is assigned by Slurm. It is expected that jobs clean up the contents of their scratch directory when the job completes, as part of the Slurm batch job. Any files older than 10 days will be automatically deleted.<br />
<br />
Each user can store up to a maximum of 15 TB in /scratch. However, due to the shared nature of /scratch, we cannot make any guarantees that the full 15 TB will be available at any given time.<br />
<br />
If /scratch becomes more than 75% full, we reserve the right to delete the files as needed.<br />
===/work===<br />
/work is intended for projects whose data requirements exceeds the storage allocation for /home. /work is requested by the Principal Investigator (PI) on behalf of the entire group. The request should be made as needed.<br />
<br />
Please contact us at support@hpc.ucagary.ca for further assistance.<br />
<br />
7-day snapshots are available.<br />
===/bulk===<br />
/bulk is intended for large allocations that have lesser I/O needs intended for streaming reads and writes. /bulk is requested by the Principal Investigator (PI) on behalf of the entire group. The request should be made as needed. Please contact us at support@hpc.ucagary.ca for further assistance.<br />
<br />
7-day snapshots are available.<br />
<br />
Portion of /bulk file system will be available to scientific instruments on campus, on a request basis.<br />
<br />
Please contact support@hpc.ucalgary.ca for further assistance.<br />
<br />
==Best Practices==<br />
<br />
{| class=wikitable<br />
! Bad<br />
! Good<br />
|-<br />
|<br />
* Submitting many jobs without a good understanding of how much data it would generate<br />
* Directories with millions of files<br />
* Excessively long file names Short descriptive names for files<br />
* The /bulk or /work file systems should not be used as a temporary directory for applications<br />
* Multiple copies of same data between individual researchers<br />
* Using ARC storage for archiving <br />
* Not ensuring your data and derived results are backed up outside of ARC storage<br />
|<br />
* Run a test job, and scale the data requirements appropriately<br />
* Use appropriate file formats or tools that reduce the number of files<br />
* Use /scratch or /tmp file systems instead <br />
* Request for a shared space <br />
* Contact support@hpc.ucalgary.ca to discuss requirements<br />
|}<br />
<br />
Please note that we reserve the right to perform emergency and critical maintenance at any moment with minimal warning.<br />
<br />
<br />
__NOTOC__</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=ARC_Storage_Terms_of_Use&diff=1441ARC Storage Terms of Use2021-07-09T14:31:52Z<p>Tthomas: added link for backing up ARC data to OneDrive</p>
<hr />
<div>There are several options for disk storage on the ARC Cluster. Please review this section carefully to decide where to place your data. Contact systems staff at support@hpc.ucalgary.ca if you have any questions.<br />
As this is a limited resource, please use the space responsibly. Disk space on the ARC Cluster should not be used as archival storage, nor should it be used as a backup (2nd copy location) for other systems (e.g. desktops, laptops, etc.).<br />
<br />
== Summary of File Storage Options ==<br />
{| class=wikitable<br />
! File System !! Type !! Snapshots !! Backups !! Quota<br />
|-<br />
! /home <br />
| NetApp<br />
FAS8200, NFS <br />
| 7 days <br />
| No <br />
| 500 GB/user<br />
|-<br />
! /scratch <br />
| NetApp<br />
FAS8200, NFS <br />
| None <br />
| No <br />
| 15 TB/user<br />
|-<br />
! /work <br />
| NetApp<br />
FAS8200, NFS <br />
| 7 days <br />
| No <br />
| By request/group<br />
|-<br />
! /bulk <br />
| NetApp<br />
FAS2720, NFS <br />
| 7 days <br />
| No <br />
| By request/group<br />
|-<br />
! /tmp <br />
| Local <br />
| None <br />
| No <br />
| 100 GB<br />
|}<br />
<br />
Currently all users are responsible for managing their own backups. You can back up data to your personal UofC OneDrive for business cloud storage. see: [[How to transfer data#rclone:%20rsync%20for%20cloud%20storage|https://rcs.ucalgary.ca/How_to_transfer_data#rclone:_rsync_for_cloud_storage]] This allocation starts at 5TB. Contact the support center for questions regarding OneDrive for Business. <br />
<br />
== Performance ==<br />
{| class=wikitable<br />
! Bulk<br />
! Work<br />
|-<br />
|<br />
* Single node streaming R/W 540/730 MB/s<br />
* Aggregate (multi-node) R/W 4140/4400 MB/s<br />
* Untar Linux kernel (~39000 files): 3:09 (m:ss)<br />
|<br />
* Single node streaming R/W 620/750 MB/s<br />
* Aggregate (multi-node) R/W 6131/5616 MB/s<br />
* Untar Linux kernel (~39000 files): 2:30 (m:ss)<br />
|}<br />
<br />
<br />
===/home===<br />
Each user has a home directory called /home/username. The /home file system has a quota of 500 GB per user which cannot be increased, and seven days of snapshots.<br />
===/scratch===<br />
/scratch is intended for temporary data for the duration of the job.<br />
<br />
Directories in /scratch are created when a user job starts. The naming of the directory is /scratch/JOBID, where JOBID is the job id that is assigned by Slurm. It is expected that jobs clean up the contents of their scratch directory when the job completes, as part of the Slurm batch job. Any files older than 10 days will be automatically deleted.<br />
<br />
Each user can store up to a maximum of 15 TB in /scratch. However, due to the shared nature of /scratch, we cannot make any guarantees that the full 15 TB will be available at any given time.<br />
<br />
If /scratch becomes more than 75% full, we reserve the right to delete the files as needed.<br />
===/work===<br />
/work is intended for projects whose data requirements exceeds the storage allocation for /home. /work is requested by the Principal Investigator (PI) on behalf of the entire group. The request should be made as needed.<br />
<br />
Please contact us at support@hpc.ucagary.ca for further assistance.<br />
<br />
7-day snapshots are available.<br />
===/bulk===<br />
/bulk is intended for large allocations that have lesser I/O needs intended for streaming reads and writes. /bulk is requested by the Principal Investigator (PI) on behalf of the entire group. The request should be made as needed. Please contact us at support@hpc.ucagary.ca for further assistance.<br />
<br />
7-day snapshots are available.<br />
<br />
Portion of /bulk file system will be available to scientific instruments on campus, on a request basis.<br />
<br />
Please contact support@hpc.ucalgary.ca for further assistance.<br />
<br />
==Best Practices==<br />
<br />
{| class=wikitable<br />
! Bad<br />
! Good<br />
|-<br />
|<br />
* Submitting many jobs without a good understanding of how much data it would generate<br />
* Directories with millions of files<br />
* Excessively long file names Short descriptive names for files<br />
* The /bulk or /work file systems should not be used as a temporary directory for applications<br />
* Multiple copies of same data between individual researchers<br />
* Using ARC storage for archiving <br />
|<br />
* Run a test job, and scale the data requirements appropriately<br />
* Use appropriate file formats or tools that reduce the number of files<br />
* Use /scratch or /tmp file systems instead <br />
* Request for a shared space <br />
* Contact support@hpc.ucalgary.ca to discuss requirements<br />
|}<br />
<br />
Please note that we reserve the right to perform emergency and critical maintenance at any moment with minimal warning.<br />
<br />
<br />
__NOTOC__</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Guide&diff=1440ARC Cluster Guide2021-07-09T14:29:05Z<p>Tthomas: added link for backing up ARC data to OneDrive</p>
<hr />
<div>{{Message Box<br />
|icon=Security Icon.png<br />
|title=Cybersecurity awareness at the U of C<br />
|message=Please note that there are typically about 950 phishing attempts targeting University of Calgary accounts each month. This is just a reminder to be careful about computer security issues, both at home and at the University. Please visit https://it.ucalgary.ca/it-security for more information, tips on secure computing, and how to report suspected security problems.}}<br />
<br />
{{Message Box<br />
|title=[[Support|Need Help or have other ARC Related Questions?]]<br />
|message=For all general RCS related issues, questions, or comments, please contact us at support@hpc.ucalgary.ca.<br />
|icon=Support Icon.png}}<br />
<br />
This guide gives an overview of the Advanced Research Computing (ARC) cluster at the University of Calgary and is intended to be read by new account holders getting started on ARC. This guide covers topics such as the hardware and performance characteristics, available software, usage policies and how to log in and run jobs. <br />
<br />
== Introduction ==<br />
The ARC compute cluster can be used for running large numbers (hundreds) of concurrent serial (one core) jobs, OpenMP or other thread-based jobs, shared-memory parallel code using up to 40 or 80 threads per job (depending on the partition), distributed-memory (MPI-based) parallel code using up to hundreds of cores, or jobs that take advantage of Graphics Processing Units (GPUs). Almost all work on ARC is done through a [[Linux Introduction|command line interface]]. This computational resource is available for research projects based at the University of Calgary and is meant to supplement the resources available to researchers through Compute Canada.<br />
<br />
Historically, ARC is primarily comprised of older, disparate Linux-based clusters that were formerly offered to researchers from across Canada such as Breezy, Lattice, and Parallel. In addition, a large-memory compute node (Bigbyte) was salvaged from the now-retired local Storm cluster. In January 2019, a major addition to ARC with modern hardware was purchased. In 2020, compute clusters from CHGI have been migrated into ARC.<br />
<br />
=== How to Get Started ===<br />
If you have a project you think would be appropriate for ARC, please write to support@hpc.ucalgary.ca and mention the intended research and software you plan to use. You must have a University of Calgary IT account in order to use ARC.<br />
* For users that do not have a University of IT account or email address, please register for one at https://itregport.ucalgary.ca/.<br />
* For users external to the University, such as for users collaborating on a research project at the University of Calgary, please contact us and mention the project leader you are collaborating with.<br />
<br />
Once your access to ARC has been granted, you will be able to immediately make use of the cluster using your University of Calgary IT account by following the [[ARC_Cluster_Guide#Using_ARC|usage guide outlined below]].<br />
<br />
== Hardware ==<br />
Since the ARC cluster is a conglomeration of many different compute clusters, the hardware within ARC can vary widely in terms of performance and capabilities. To mitigate any compatibility issues with different hardware, we combine similar hardware into their own Slurm partition to ensure your workload runs as consistently as possible within one partition. Please carefully review the hardware specs for each of the partitions below to avoid any surprises.<br />
<br />
=== Partition Hardware Specs ===<br />
When submitting jobs to ARC, you may specify a partition that your job will run on. Please choose a partition that is most appropriate for your work.<br />
<br />
A few things to keep in mind when choosing a partition:<br />
* Specific workloads requiring special Intel Instruction Set Extensions may only work on newer Intel CPUs. <br />
* If working with multi-node parallel processing, ensure your software and libraries support the partition's interconnect networking.<br />
* While older partitions may be slower, they may be less busy and have little to no wait times.<br />
<br />
If you are unsure which partition to use or need assistance on selecting an appropriate partition, please see [[#Selecting_a_Partition|the Selecting a Partition Section]] below. <br />
<br />
{| class="wikitable"<br />
! Partition<br />
! Description<br />
! Nodes<br />
! CPU Cores, Model, and Year<br />
! Memory<br />
! GPU<br />
! Network<br />
|-<br />
| -<br />
| ARC Login Node<br />
| 1<br />
| 16 cores, 2x Intel(R) Xeon(R) CPU E5620 @ 2.40GHz (2010)<br />
| 48 GB<br />
| N/A<br />
| 40 Gbit/s InfiniBand<br />
|-<br />
| gpu-v100<br />
| GPU Parition<br />
| 13<br />
| 80 cores, 2x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (2019)<br />
| 754 GB<br />
| 2x Tesla V100-PCIE-16GB<br />
| 100 Gbit/s Omni-Path<br />
|-<br />
| cpu2019<br />
| General Purpose Compute<br />
| 14<br />
| 40 cores, 2x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (2019)<br />
| 190 GB<br />
| N/A <br />
| 100 Gbit/s Omni-Path<br />
|-<br />
| apophis<br />
| General Purpose Compute<br />
| 21<br />
| 40 cores, 2x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (2019)<br />
| 190 GB<br />
| N/A <br />
| 100 Gbit/s Omni-Path<br />
|-<br />
| razi<br />
| General Purpose Compute<br />
| 41<br />
| 40 cores, 2x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (2019)<br />
| 190 GB<br />
| N/A <br />
| 100 Gbit/s Omni-Path<br />
|-<br />
| bigmem<br />
| Big Memory Nodes<br />
| 2<br />
| 80 cores, 4x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (2019)<br />
| 3022 GB<br />
| N/A <br />
| 100 Gbit/s Omni-Path<br />
|-<br />
| pawson<br />
| General Purpose Compute<br />
| 13<br />
| 40 cores, 2x Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz (2019)<br />
| 190 GB<br />
| N/A<br />
| 100 Gbit/s Omni-Path<br />
|-<br />
|cpu2017<br />
|General Purpose Compute<br />
|14<br />
|56 cores, 2x Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz (2016)<br />
|256 GB<br />
|N/A<br />
|40 Gbit/s InfiniBand<br />
|-<br />
| theia<br />
| Former Theia cluster<br />
| 20<br />
| 56 cores, 2x Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz (2012)<br />
| 188 GB<br />
| N/A <br />
| 40 Gbit/s InfiniBand<br />
|-<br />
| cpu2013<br />
| Former hyperion cluster<br />
| 12<br />
| 32 cores, 2x Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz (2012)<br />
| 126 GB<br />
| N/A<br />
| 40 Gbit/s InfiniBand<br />
|-<br />
| lattice<br />
| Former Lattice cluster<br />
| 307<br />
| 8 cores, 2x Intel(R) Xeon(R) CPU L5520 @ 2.27GHz (2011)<br />
| 12 GB<br />
| N/A<br />
| 40 Gbit/s InfiniBand<br />
|-<br />
| single<br />
| Former Lattice cluster<br />
| 168<br />
| 8 cores, 2x Intel(R) Xeon(R) CPU L5520 @ 2.27GHz (2011)<br />
| 12 GB<br />
| N/A<br />
| 40 Gbit/s InfiniBand<br />
|-<br />
| parallel<br />
| Former Parallel Cluster<br />
| 576<br />
| 12 cores, 2x Intel(R) Xeon(R) CPU E5649 @ 2.53GHz (2011)<br />
| 24 GB<br />
| N/A<br />
| 40 Gbit/s InfiniBand<br />
|}<br />
<br />
===ARC Cluster Storage===<br />
Usage of ARC cluster storage is outlined by our [[ARC Storage Terms of Use]] page.<br />
<br />
{{Message Box<br />
| title=No Backup Policy!<br />
| message=You are responsible for your own backups. Many researchers will have accounts with Compute Canada and may choose to back up their data there (the Project file system accessible through the Cedar cluster would often be used). <br />
<br />
Please contact us at support@hpc.ucalgary.ca if you want more information about this option.<br />
<br />
You can also back up data to your UofC OneDrive for business allocation see: https://rcs.ucalgary.ca/How_to_transfer_data#rclone:_rsync_for_cloud_storage This allocation starts at 5TB. Contact the support center for questions regarding OneDrive for Business.<br />
}}<br />
<br />
The ARC cluster has around 2 petabyte of shared disk storage available across the entire cluster as well as temporary storage local to each of the compute nodes. Please refer to the individual sections below on the capacity limitations and usage policies. <br />
<br />
Use the <code>arc.quota</code> command on ARC to determine the available space on your various volumes and home directory.<br />
<br />
{| class="wikitable"<br />
!Partition<br />
!Description<br />
!Capacity<br />
|-<br />
|<code>/home</code><br />
|User home directories<br />
|500 GB (per user)<br />
|-<br />
|<code>/work</code><br />
|Research project storage<br />
|Up to 100's of TB<br />
|-<br />
|<code>/scratch</code><br />
|Scratch space for temporary files<br />
|Up to 30 TB<br />
|-<br />
|<code>/tmp</code><br />
|Temporary space local to the compute cluster<br />
|Dependent on nodes, use <code>df -h</code>.<br />
|-<br />
|<code>/dev/shm</code><br />
|Small temporary in-memory disk space local to the compute cluster<br />
|Dependent on nodes, use <code>df -h</code>.<br />
|}<br />
====<code>/home</code>: Home file system====<br />
Each user has a directory under /home and is the default working directory when logging in to ARC. Each home directory has a per-user quota of 500 GB. This limit is fixed and cannot be increased. Researchers requiring additional storage exceeding what is available on their home directory may use <code>/work</code> and <code>/scratch</code>.<br />
<br />
Note on file sharing: Due to security concerns, permissions set using <code>chmod</code> on your home directory to allow other users to read/write to your home directory be automatically reverted by an automated system process unless an explicit exception is made. If you need to share files with other researchers on the ARC cluster, please write to support@hpc.ucalgary.ca to ask for such an exception.<br />
<br />
====<code>/scratch</code>: Scratch file system for large job-oriented storage====<br />
Associated with each job, under the <code>/scratch</code> directory, a subdirectory is created that can be referenced in job scripts as <code>/scratch/${SLURM_JOB_ID}</code>. You can use that directory for temporary files needed during the course of a job. Up to 30 TB of storage may be used, per user (total for all your jobs) in the <code>/scratch</code> file system. <br />
<br />
Data in <code>/scratch</code> associated with a given job will be deleted automatically, without exception, five days after the job finishes.<br />
<br />
====<code>/work</code>: Work file system for larger projects====<br />
If you need more space than provided in <code>/home</code> and the <code>/scratch</code> job-oriented space is not appropriate for you case, please write to support@hpc.ucalgary.ca with an explanation, including an indication of how much storage you expect to need and for how long. If approved, you will then be assigned a directory under <code>/work</code> with an appropriately large quota.<br />
<br />
====<code>/tmp</code>, <code>/dev/shm</code>: Temporary files====<br />
You may use <code>/tmp</code> for temporary files generated by your job. The <code>/tmp</code> is stored on a disk local to the compute node and is not shared across the cluster. The files stored here may be removed immediately after your job terminates.<br />
<br />
<code>/dev/shm</code> is similar to <code>/tmp</code> but the storage is backed by virtual memory for higher IOPS. This is ideal for workloads that require many small read/writes to share data between processes or as a fast cache. The files stored here may be removed immediately after your job terminates.<br />
<br />
== Using ARC ==<br />
=== Logging in ===<br />
To log in to ARC, connect using SSH to <code>arc.ucalgary.ca</code> on port <code>22</code>. Connections to ARC are accepted only from the University of Calgary network (on campus) or through the University of Calgary General VPN (off campus).<br />
<br />
See [[Connecting to RCS HPC Systems]] for more information.<br />
=== How to interact with ARC ===<br />
<br />
ARC cluster is a collection of several compute nodes connected by a high-speed network. On ARC, computations get submitted as jobs. Once submitted, the jobs are then assigned to compute nodes by the job scheduler as resources become available.<br />
[[File:Cluster.png]]<br />
<br />
You can access ARC with your UCalgary IT user credentials. Once connected, you will get placed in the ARC login node, for basic tasks such as job submission, monitor job status, manage files, edit text, etc. It is a shared resource where multiple users get connected at the same time. Thus, any intensive tasks is not allowed on the login node as it may block other potential users to connect/submit their computations. <br />
[tannistha.nandi@arc ~]$ <br />
The job scheduling system on ARC is called SLURM. On ARC, there are two SLURM commands that can allocate resources to a job under appropriate conditions: ‘salloc’ and ‘sbatch’. They both accept the same set of command line options with respect to resource allocation. <br />
<br />
'''‘salloc’''' is to launch an interactive session, typically for tasks under 5 hours. <br />
Once an interactive job session is created, you can do things like explore research datasets, start R or python sessions to test your code, compile software applications etc.<br />
<br />
a. Example 1: The following command requests for 1 cpu on 1 node for 1 task along with 1 GB of RAM for an hour. <br />
[tannistha.nandi@arc ~]$ salloc --mem=1G -c 1 -N 1 -n 1 -t 01:00:00<br />
salloc: Granted job allocation 6758015<br />
salloc: Waiting for resource configuration<br />
salloc: Nodes fc4 are ready for job<br />
[tannistha.nandi@fc4 ~]$ <br />
<br />
<br />
b. Example 2: The following command requests for 1 GPU to be used from 1 node belonging to the gpu-v100 partition along with 1 GB of RAM for 1 hour. Generic resource scheduling (--gres) is used to request for GPU resources.<br />
[tannistha.nandi@arc ~]$ salloc --mem=1G -t 01:00:00 -p gpu-v100 --gres=gpu:1<br />
salloc: Granted job allocation 6760460<br />
salloc: Waiting for resource configuration<br />
salloc: Nodes fg3 are ready for job<br />
[tannistha.nandi@fg3 ~]$<br />
<br />
Once you finish the work, type 'exit' at the command prompt to end the interactive session,<br />
[tannistha.nandi@fg3 ~]$ exit<br />
[tannistha.nandi@fg3 ~]$ salloc: Relinquishing job allocation 6760460<br />
It is to ensure that the allocated resources are released from your job and now available to other users.<br />
<br />
'''‘sbatch’''' is to submit computations as jobs to run on the cluster. You can submit a job-script.slurm via 'sbatch' for execution. <br />
[tannistha.nandi@arc ~]$ sbatch job-script.slurm<br />
When resources become available, they get allocated to this task. Batch jobs are suited for tasks that run for long periods of time without any user supervision. When the job-script terminates, the allocation is released. <br />
Please review the section on how to prepare job scripts for more information.<br />
<br />
=== Prepare job scripts ===<br />
Job scripts are text files saved with an extension '.slurm', for example, 'job-script.slurm'. <br />
A job script looks something like this:<br />
''#!/bin/bash''<br />
####### Reserve computing resources #############<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks=1<br />
#SBATCH --cpus-per-task=1<br />
#SBATCH --time=01:00:00<br />
#SBATCH --mem=1G<br />
#SBATCH --partition=cpu2019<br><br />
####### Set environment variables ###############<br />
module load python/anaconda3-2018.12<br><br />
####### Run your script #########################<br />
python myscript.py<br />
<br />
The first line contains the text "#!/bin/bash" to interpret it as a bash script.<br />
<br />
It is followed by lines that start with a '#SBATCH' to communicate with 'SLURM'. You may add as many #SBATCH directives as needed to reserve computing resources for your task. The above example requests for one cpu on a single node for 1 task along with 1GB RAM for an hour on cpu2019 partition.<br />
<br />
Next, you have to set up environment variables either by loading the modules centrally installed on ARC or export path to the software in your home directory. The above example loads an available python module.<br />
<br />
Finally, include the Linux command to execute the local script.<br />
<br />
Note that failing to specify part of a resource allocation request (most notably '''time''' and '''memory''') will result in bad resource requests as the defaults are not appropriate to most cases. Please refer to the section 'Running non-interactive jobs' for more examples.<br />
<br />
=== Software ===<br />
All ARC nodes run the latest version of CentOS 7 with the same set of base software packages. To maintain the stability and consistency of all nodes, any additional dependencies that your software requires must be installed under your account. For your convenience, we have packaged commonly used software packages and dependencies as modules available under <code>/global/software</code>. If your software package is not available as a module, you may also try Anaconda which allows users to manage and install custom packages in an isolated environment.<br />
<br />
For a list of available packages that have been made available, please see [[ARC Software pages]]. <br />
<br />
Please contact us at support@hpc.ucalgary.ca if you need additional software installed.<br />
<br />
==== Modules ====<br />
The setup of the environment for using some of the installed software is through the <code>module</code> command. An overview of [https://www.westgrid.ca//support/modules modules on WestGrid (external link)] is largely applicable to ARC.<br />
<br />
Software packages bundled as a module will be available under <code>/global/software</code> and can be listed with the <code>module avail</code> command.<br />
<syntaxhighlight lang="bash"><br />
$ module avail<br />
</syntaxhighlight><br />
<br />
To enable Python, load the Python module by running:<br />
<syntaxhighlight lang="bash"><br />
$ module load python/anaconda-3.6-5.1.0<br />
</syntaxhighlight><br />
<br />
To unload the Python module, run:<br />
<syntaxhighlight lang="bash"><br />
$ module remove python/anaconda-3.6-5.1.0<br />
</syntaxhighlight><br />
<br />
To see currently loaded modules, run:<br />
<syntaxhighlight lang="bash"><br />
$ module list<br />
</syntaxhighlight><br />
<br />
By default, no modules are loaded on ARC. If you wish to use a specific module, such as the Intel compilers or the Open MPI parallel programming packages, you must load the appropriate module.<br />
<br />
=== Storage ===<br />
Please review the [[#Storage|Storage]] section above for important policies and advice regarding file storage and file sharing.<br />
<br />
=== Interactive Jobs ===<br />
The ARC login node may be used for such tasks as editing files, compiling programs and running short tests while developing programs. We suggest CPU intensive workloads on the login node be restricted to under 15 minutes as per [[General Cluster Guidelines and Policies|our cluster guidelines]]. For interactive workloads exceeding 15 minutes, use the '''[[Running_jobs#Interactive_jobs|salloc command]]''' to allocate an interactive session on a compute node.<br />
<br />
The default salloc allocation is 1 CPU and 1 GB of memory. Adjust this by specifying <code>-n CPU#</code> and <code>--mem Megabytes</code>. You may request up to 5 hours of CPU time for interactive jobs.<br />
salloc --time 5:00:00 --partition cpu2019<br />
<br />
Always use salloc or srun to start an interactive job. Do not SSH directly to a compute node as SSH sessions will be refused without an active job running.<br />
<br />
<!-- This information doesn't seem that useful or relevant to running interactive jobs. Move to getting started section?<br />
ARC uses the Linux operating system. The program that responds to your typed commands and allows you to run other programs is called the Linux shell. There are several different shells available, but, by default you will use one called bash. It is useful to have some knowledge of the shell and a variety of other command-line programs that you can use to manipulate files. If you are new to Linux systems, we recommend that you work through one of the many online tutorials that are available, such as the [http://www.ee.surrey.ac.uk/Teaching/Unix/index.html UNIX Tutorial for Beginners (external link)] provided by the University of Surrey. The tutorial covers such fundamental topics, among others, as creating, renaming and deleting files and directories, how to produce a listing of your files and how to tell how much disk space you are using. For a more comprehensive introduction to Linux, see [http://linuxcommand.sourceforge.net/tlcl.php The Linux Command Line (external link)].<br />
--><br />
<br />
=== Running non-interactive jobs (batch processing) ===<br />
Production runs and longer test runs should be submitted as (non-interactive) batch jobs, in which commands to be executed are listed in a script (text file). Batch jobs scripts are submitted using the <code>sbatch</code> command, part of the Slurm job management and scheduling software. #SBATCH directive lines at the beginning of the script are used to specify the resources needed for the job (cores, memory, run time limit and any specialized hardware needed).<br />
<br />
Most of the information on the [https://docs.computecanada.ca/wiki/Running_jobs Running Jobs (external link)] page on the Compute Canada web site is also relevant for submitting and managing batch jobs and reserving processors for interactive work on ARC. One major difference between running jobs on the ARC and Compute Canada clusters is in selecting the type of hardware that should be used for a job. On ARC, you choose the hardware to use primarily by specifying a partition, as described below.<br />
<br />
=== Selecting a Partition ===<br />
There are some aspects to consider when selecting a partition including:<br />
* Resource requirements in terms of memory and CPU cores<br />
* Hardware specific requirements, such as GPU or CPU Instruction Set Extensions<br />
* Partition resource limits and potential wait time<br />
* Software support parallel processing using Message Passing Interface (MPI), OpenMP, etc.<br />
** Eg. MPI for parallel processing can distribute memory across multiple nodes, per-node memory requirements could be lower. Whereas, OpenMP or single process code that is restricted to one node would require a higher memory node.<br />
** Note: MPI code running on hardware with Omni-Path networking should be compiled with Omni-Path networking support. This is provided by loading the <code>openmpi/2.1.3-opa</code> or <code>openmpi/3.1.2-opa</code> modules prior to compiling.<br />
<br />
Since resources that are requested are reserved for your job, please request only as much CPU and memory as your job requires to avoid reducing the cluster efficiency. If you are unsure which partition to use or the specific resource requests that are appropriate for your jobs, please contact us at [mailto:support@hpc.ucalgary.ca support@hpc.ucalgary.ca] and we would be happy to work with you.<br />
<br />
{| class="wikitable" style="width: 100%;"<br />
!Partition<br />
!Description<br />
!Cores/node<br />
!Memory Request Limit<br />
!Time Limit<br />
!GPU<br />
!Networking<br />
|-<br />
|cpu2019<br />
|General Purpose Compute<br />
|40<br />
|185,000 MB<br />
|7 days ‡<br />
|<br />
|100 Gbit/s Omni-Path<br />
|-<br />
|bigmem<br />
|Big Memory Compute<br />
|80<br />
|3,000,000 MB<br />
|24 hours ‡<br />
|<br />
|100 Gbit/s Omni-Path<br />
|-<br />
|gpu-v100<br />
|GPU Compute<br />
|80<br />
|753,000 MB<br />
|24 hours ‡<br />
|2<br />
|100 Gbit/s Omni-Path<br />
|-<br />
|apophis&dagger;<br />
|Private Research Partition<br />
|40<br />
|185,000 MB<br />
|7 days ‡<br />
|<br />
|100 Gbit/s Omni-Path<br />
|-<br />
|apophis-bf&dagger;<br />
|Back-fill Compute<br />
|40<br />
|185,000 MB<br />
|5 hours ‡<br />
|<br />
|100 Gbit/s Omni-Path<br />
|-<br />
|razi&dagger;<br />
|Private Research Partition<br />
|40<br />
|185,000 MB<br />
|7 days ‡<br />
|<br />
|100 Gbit/s Omni-Path<br />
|-<br />
|razi-bf&dagger;<br />
|Back-fill Compute<br />
|40<br />
|185,000 MB<br />
|5 hours ‡<br />
|<br />
|100 Gbit/s Omni-Path<br />
|-<br />
|pawson&dagger;<br />
|Private Research Partition<br />
|40<br />
|185,000 MB<br />
|7 days ‡<br />
|<br />
|100 Gbit/s Omni-Path<br />
|-<br />
|pawson-bf&dagger;<br />
|Back-fill Compute<br />
|40<br />
|185,000 MB<br />
|5 hours ‡<br />
|<br />
|100 Gbit/s Omni-Path<br />
|-<br />
|theia&dagger;<br />
|Private Research Partition<br />
|28<br />
|188,000 MB<br />
|7 days ‡<br />
|<br />
|40 Gbit/s InfiniBand<br />
|-<br />
|theia-bf&dagger;<br />
|Back-fill Compute<br />
|28<br />
|188,000 MB<br />
|5 hours ‡<br />
|<br />
|40 Gbit/s InfiniBand<br />
|-<br />
|synergy&dagger;<br />
|Private Research Partition<br />
|14<br />
|245,000 MB<br />
|7 days ‡<br />
|<br />
|40 Gbit/s InfiniBand<br />
|-<br />
|synergy-bf&dagger;<br />
|Back-fill Compute<br />
|14<br />
|245,000 MB<br />
|5 hours ‡<br />
|<br />
|40 Gbit/s InfiniBand<br />
|-<br />
|cpu2013<br />
|Legacy General Purpose Compute<br />
|16<br />
|120000<br />
|7 days ‡<br />
|<br />
|40 Gbit/s InfiniBand<br />
|-<br />
|lattice<br />
|Legacy General Purpose Compute<br />
|8<br />
|12000<br />
|7 days ‡<br />
|<br />
|40 Gbit/s InfiniBand<br />
|-<br />
|parallel<br />
|Legacy General Purpose Compute<br />
|12<br />
|23000<br />
|7 days ‡<br />
|<br />
|40 Gbit/s InfiniBand<br />
|-<br />
|single<br />
|Legacy Single-Node Job Compute<br />
|8<br />
|12000<br />
|7 days ‡<br />
|<br />
|40 Gbit/s InfiniBand<br />
|-<br />
|+ style="caption-side: bottom; text-align: left; font-weight: normal;" | &dagger; These partitions contain hardware contributed to ARC by particular researchers and should only be used by members of their research groups. However, they have generously allowed their compute nodes to be shared with others outside their research groups for short jobs. A special 'back-fill' or -bf partition is available for use by all ARC users for jobs shorter than 5 hours.<br /><br />
‡ As time limits may be changed by administrators to adjust to maintenance schedules or system load, the values given in the tables are not definitive. See the Time limits section below for commands you can use on ARC itself to determine current limits.<br />
|}<br />
<br />
==== Hardware resource and job policy limits ====<br />
In addition to the hardware limitations, please be aware that there may also be policy limits imposed on your account for each partition. These limits restrict the number of cores, nodes, or GPUs that can be used at any given time. Since the limits are applied on a partition-by-partition basis, using resources in one partition should not affect the available resources you can use in another partition.<br />
<br />
These limits can be listed by running:<br />
<syntaxhighlight lang="bash"><br />
$ sacctmgr show qos format=Name,MaxWall,MaxTRESPU%20,MaxSubmitJobs<br />
Name MaxWall MaxTRESPU MaxSubmit<br />
---------- ----------- -------------------- ---------<br />
normal 7-00:00:00 2000<br />
breezy 3-00:00:00 cpu=384 2000<br />
gpu 7-00:00:00 13000<br />
cpu2019 7-00:00:00 cpu=240 2000<br />
gpu-v100 1-00:00:00 cpu=80,gres/gpu=4 2000<br />
single 7-00:00:00 cpu=408,node=75 2000<br />
razi 7-00:00:00 2000<br />
</syntaxhighlight><br />
<br />
==== Specifying a partition in a job ====<br />
One you have decided which partitions best suits your computation, you can select one or more partition on a job-by-job basis by including the <code>partition</code> keyword for an <code>SBATCH</code> directive in your batch job. Multiple partitions should be comma separated. If you omit the partition specification, the system will try to assign your job to appropriate hardware based on other aspects of your request. <br />
<br />
In some cases, you really should specify the partition explicitly. For example, if you are running single-node jobs with thread-based parallel processing requesting 8 cores you could use:<br />
<syntaxhighlight lang="bash"><br />
#SBATCH --mem=0 ❶<br />
#SBATCH --nodes=1 ❷<br />
#SBATCH --ntasks=1 ❸<br />
#SBATCH --cpus-per-task=8 ❹<br />
#SBATCH --partition=single,lattice ❺ <br />
</syntaxhighlight><br />
<br />
A few things to mention in this example:<br />
# <code>--mem=0</code> allocates all available memory on the compute node for the job. This effectively allocates the entire node for your job.<br />
# <code>--nodes=1</code> allocates 1 node for the job<br />
# <code>--ntasks=1</code> your job has a single task<br />
# <code>--cpus-per-task=8</code> asks for 8 CPUs per task. This job in total will request 8 * 1, or 8 CPUs.<br />
# <code>--partition=single,lattice</code> specifies that this job can run on either single or lattice.<br />
Suppose that your job requires at most 8 CPU cores and 10 GB of memory. The above Slurm request would be valid and optimal since your job fits neatly in a single node on the single and parallel partition. However, if you failed to specify the partition, Slurm may try to schedule your job to a partition with larger nodes, such as cpu2019 where each node has 40 cores and 190 GB of memory. If your job is scheduled on such a node, your job will be effectively wasting 32 cores and 180 GB of memory because <code>--mem=0</code> not only requests for 190 GB on this node, but also prevents other jobs from being scheduled on the same node.<br />
<br />
If you don't specify a partition, please give greater thought to the memory specification to make sure that the scheduler will not assign your job more resources than are needed.<br />
<br />
Parameters such as '''--ntasks-per-cpu''', '''--cpus-per-task''', '''--mem''' and '''--mem-per-cpu>''' have to be adjusted according to the capabilities of the hardware also. The product of --ntasks-per-cpu and --cpus-per-task should be less than or equal to the number given in the "Cores/node" column. The '''--mem>''' parameter (or the product of '''--mem-per-cpu''' and '''--cpus-per-task''') should be less than the "Memory limit" shown. If using whole nodes, you can specify '''--mem=0''' to request the maximum amount of memory per node.<br />
<br />
===== Examples =====<br />
Here are some examples of specifying the various partitions.<br />
<br />
As mentioned in the [[#Hardware|Hardware]] section above, the ARC cluster was expanded in January 2019. To select the 40-core general purpose nodes specify:<br />
<br />
#SBATCH --partition=cpu2019<br />
<br />
To run on the Tesla V100 GPU-enabled nodes, use the '''gpu-v100''' partition. You will also need to include an SBATCH directive in the form '''--gres=gpu:n''' to specify the number of GPUs, n, that you need. For example, if the software you are running can make use of both GPUs on a gpu-v100 partition compute node, use:<br />
<br />
#SBATCH --partition=gpu-v100 --gres=gpu:2<br />
<br />
For very large memory jobs (more than 185000 MB), specify the bigmem partition:<br />
<br />
#SBATCH --partition=bigmem<br />
<br />
If the more modern computers are too busy or you have a job well-suited to run on the compute nodes described in the legacy hardware section above, choose the cpu2013, Lattice or Parallel compute nodes by specifying the corresponding partition keyword:<br />
<br />
#SBATCH --partition=cpu2013<br />
#SBATCH --partition=lattice<br />
#SBATCH --partition=parallel<br />
<br />
There is an additional partition called '''single''' that provides nodes similar to the lattice partition, but, is intended for single-node jobs. Select the single partition with<br />
<br />
#SBATCH --partition=single<br />
<br />
=== Time limits ===<br />
Use the <code>--time</code> directive to tell the job scheduler the maximum time that your job might run. For example:<br />
#SBATCH --time=hh:mm:ss<br />
<br />
You can use <code>scontrol show partitions</code> or <code>sinfo</code> to see the current maximum time that a job can run.<br />
<syntaxhighlight lang="bash" highlight="6"><br />
$ scontrol show partitions<br />
PartitionName=single <br />
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL <br />
AllocNodes=ALL Default=NO QoS=single <br />
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO <br />
MaxNodes=UNLIMITED MaxTime=7-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED <br />
Nodes=cn[001-168] <br />
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO <br />
OverTimeLimit=NONE PreemptMode=OFF <br />
State=UP TotalCPUs=1344 TotalNodes=168 SelectTypeParameters=NONE <br />
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED <br />
</syntaxhighlight><br />
<br />
Alternatively, with <code>sinfo</code> under the <code>TIMELIMIT</code> column:<br />
<syntaxhighlight lang="bash"><br />
$ sinfo <br />
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST <br />
single up 7-00:00:00 1 drain* cn097 <br />
single up 7-00:00:00 1 maint cn002 <br />
single up 7-00:00:00 4 drain* cn[001,061,133,154] <br />
...<br />
</syntaxhighlight><br />
<br />
== Support ==<br />
{{Message Box<br />
|title=[[Support|Need Help or have other ARC Related Questions?]]<br />
|message=For all general RCS related issues, questions, or comments, please contact us at support@hpc.ucalgary.ca.<br />
|icon=Support Icon.png}}<br />
<br />
Please don't hesitate to [[Support|contact us]] directly by email if you need help using ARC or require guidance on migrating and running your workflows to ARC.<br />
<br />
[[Category:ARC]]<br />
[[Category:Guides]]</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=Storage_Options&diff=1439Storage Options2021-07-09T14:23:39Z<p>Tthomas: OneDrive backup link</p>
<hr />
<div>There are a few options researchers can take advantage of when storing their research data. <br />
<br />
== Data Classification ==<br />
Please review the different data classifications that are outlined by the [https://ucalgary.ca/legal-services/sites/default/files/teams/1/Standards-Legal-Information-Security-Classification-Standard.pdf Information Security Classification Standard]. There are 4 levels of data classification which are summarized in the table below.<br />
<br />
{| class="wikitable"<br />
! Level<br />
! Description<br />
! Example<br />
|-<br />
| Level 1<br />
| Public<br />
|<br />
* Reference data sets<br />
* Published research data<br />
|-<br />
| Level 2<br />
| Internal<br />
|<br />
* Internal memos<br />
* Unpublished research data<br />
* Anonymized or de-identified human subject data<br />
* Library transactions and journals<br />
|-<br />
| Level 3<br />
| Confidential<br />
|<br />
* Faculty/staff employment applications, personnel files, contact information<br />
* Donor or prospective donor information<br />
* Contracts<br />
* Intellectual property<br />
|-<br />
| Level 4<br />
| Restricted<br />
|<br />
* Patient identifiable health information<br />
* identifiable human subject research data<br />
* information subject to special government requirements<br />
|}<br />
<br />
When selecting a storage option, you must use one that meets or exceeds the rated security classification.<br />
<br />
== Research Data Management ==<br />
We recommend you follow good Research Data Management practices and ensure you have a DMP (Data Management Plan) created to guide your data's lifecycle. DMP Assistant has been created specifically for Canadian scholars and aims to meet any and all Tri-Agency requirements. See: https://assistant.portagenetwork.ca/<br />
<br />
Your DMP can help us support the FAIR (findable, accessible, interoperable and reusable) principles for data management.<br />
<br />
Please consider contacting Libraries and Cultural Resources for assistance. For guidance on general data management and developing a DMP, consult https://library.ucalgary.ca/guides/researchdatamanagement or contact research.data@ucalgary.ca.<br />
<br />
For support using PRISM Dataverse, UofC's institutional data repository, contact digitize@ucalgary.ca.<br />
<br />
If you need to share and preserve your large post-publication data set for a mandated period of time, please visit https://www.frdr-dfdr.ca/repo/ in order to learn more about the national Federated Research Data Repository. <br />
<br />
FRDR aligns with Tri-Agency Principles as a platform for Preservation, Retention and Sharing of research data. see: [http://www.science.gc.ca/eic/site/063.nsf/eng/h_83F7624E.html Tri-Agency Statement of Principles on Digital Data Management]<br />
<br />
== Secure Compute Data Storage (SCDS) ==<br />
Secure Computing Data Storage (SCDS) is a service provided by Research Computing Services that allows researchers to store restricted and confidential data. Collaboration with Level 4 data stored in SCDS is possible using ShareFile, a secure file sharing and collaboration tool by Citrix.<br />
<br />
{| class="wikitable"<br />
! Capacity<br />
| 10 GB or more<br />
|-<br />
! Classification<br />
| Level 4<br />
|-<br />
! Learn More<br />
| Visit [https://it.ucalgary.ca/secure-computing-platform The SCDS Website]<br />
|-<br />
! Request Access<br />
| Visit [https://ucalgary.service-now.com/kb_view.do?sysparm_article=KB0030163 ServiceNow to request access]<br />
|}<br />
<br />
== AcademicFS ==<br />
AcademicFS is a UofC hosted SMB/CIFS storage solution funded and operated by RCS. It is available by request to faculty and staff with active research data.<br />
{| class="wikitable"<br />
! Capacity<br />
| 100GB with quota increases available on request. <br />
|-<br />
! Classification<br />
| Level 1 - 2<br />
|-<br />
! Request Access<br />
| Visit [https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=fe66b3a7db297300897e4b8b0b96199d ServiceNow to request access]<br />
|}<br />
=== Service Description ===<br />
You may use AcademicFS to store your active research data files. AcademicFS is intended to be used as a research group or project share. AcademicFS is available on campus or off campus using the IT supported VPN client. Information on how to download and install the VPN client can be found here: https://ucalgary.service-now.com/it?id=kb_article&sys_id=880e71071381ae006f3afbb2e144b05c (IT account login may be required).<br />
All AcademicFS users must have a UofC IT account.<br />
<br />
=== Data recovery ===<br />
AcademicFS does daily snapshots at a bit past midnight, which it keeps for 30 days. You should be able to recover a deleted file for up to 30 days, if it was in your share overnight. If you create a file and delete it during a day, no snapshot will be available for you to recover. AcademicFS presents backups using the windows OS 'previous versions' functionality. If you are not familiar with using this, or if you are on a Linux or MacOS device, you can request a restore, with Service Now.<br />
<br />
For backup, we replicate changes to a distant data center every hour. The storage hardware which hosts your data is located in the basement of the Math Sciences building and our backup is ~22KM distant, so in case of an on campus disaster, your data should be safe.<br />
<br />
=== Support for AcademicFS ===<br />
If you have questions, please contact the IT Support Centre.<br />
: Mon – Fri: 8:30 am – 5:00 pm; Sat, Sun & holidays: 10:00 am – 2:00 pm.<br />
: Live Chat: ucalgary.ca/it<br />
: Email: itsupport@ucalgary.ca<br />
: Phone: 403.220.5555<br />
: In person: 773 Math Science<br />
<br />
<br />
<br />
== OneDrive for Business ==<br />
OneDrive for Business is a storage solution provided by Microsoft and is available by request to all faculty and staff.<br />
{| class="wikitable"<br />
! Capacity<br />
| 5 TB<br />
|-<br />
! Classification<br />
| Level 1 - 4<br />
|-<br />
! Request Access<br />
| Visit [https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=522b68ebdb83e700897e4b8b0b961997 ServiceNow to request access]<br />
|}<br />
<br />
You may use OneDrive for Business to store your personal and work related files. <br />
Files stored within OneDrive are by default private only to you but has the option to allow sharing and collaboration with others. <br />
OneDrive for Business cannot be used as a department or project share space.<br />
There is no group/lab offering with OneDrive. <br />
<br />
<br />
While OneDrive provides a secure/compliant location from an IT Security stand point, <br />
it’s not the most adequate location for data the PI is accountable for 5 years upon completion of the study. <br />
This is not a security issue, but a data management issue.<br />
OneDrive could be great for short term needs, <br />
but '''SCDS''' (see above) is the best option to store all research related electronic records for the long run. <br />
For example, if a study was using a personal OneDrive of one of the researchers to store all the records, <br />
and the researcher was to leave the university, this OneDrive would be gone in 30 days. <br />
That would never happen with SCDS where we are able to track discrete containers which remain associated to the PI and the REB project as one entity.<br />
<br />
<br />
MS has an automation capability for their O365 products.<br />
If you have a windows OS machine, you can use the automation product ‘Flow’ to copy a file to a local file system when a new file is created on OneDrive.<br />
<br />
To back up data residing on ARC to your personal OneDrive allocation please see: [[How to transfer data#rclone: rsync for cloud storage]]<br />
<br />
OneDrive requires Multi-Factor Authentication (MFA) enabled on your University of Calgary IT account. <br />
<br />
<br />
UofC OneDrive data is reportedly hosted in Canada (Markham Ont).<br />
<br />
===Request Access===<br />
To request for OneDrive for Business:<br />
#Submit your request on ServiceNow using the OneDrive for Business request form (https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=522b68ebdb83e700897e4b8b0b961997)<br />
#The IT Support Centre will contact you<br />
#Set a time with IT Support Centre to turn on MFA<br />
# Turn on MFA<br />
#Turn on OneDrive for Business<br />
<br />
===Support for OneDrive for Business===<br />
If you have questions, please contact the IT Support Centre.<br />
: Mon – Fri: 8:30 am – 5:00 pm; Sat, Sun & holidays: 10:00 am – 2:00 pm.<br />
:Live Chat: ucalgary.ca/it<br />
:Email: itsupport@ucalgary.ca<br />
: Phone: 403.220.5555<br />
:In person: 773 Math Science<br />
<br />
===Data recovery===<br />
<br />
===Other Resources===<br />
For more information on OneDrive for Business:<br />
* Operating Level of Agreement KB0032404 (https://ucalgary.service-now.com/it?id=kb_article&sys_id=7f57bddcdb56a3047cab5068dc9619b6)<br />
*OneDrive for Business Getting Started KB0032351 (https://ucalgary.service-now.com/it?id=kb_article&sys_id=60994170db2da7487cab5068dc961900)<br />
<br />
Any questions regarding if data hosted on OneDrive is subject to US jurisdiction discovery or access should be directed to:<br />
*https://cumming.ucalgary.ca/research-institutes/csm-research-services/legal-research-services (CSM researchers.)<br />
*https://research.ucalgary.ca/contact/research-services (Not CSM Researchers)<br />
*https://www.ucalgary.ca/legalservices/ (for teaching/learning – non research enquiries that make their way to you)<br />
<br />
==Office365 SharePoint for research groups==<br />
<br />
To be determined....<br />
<br />
Researchers will be able to request an Office 365 SharePoint site for a group at some point in the future <br />
which could be considered a group cloud sharing platform.<br />
<br />
==Personal storage options==<br />
For personal or level 1 data, you may use external solutions from WestGrid or Compute Canada.<br />
<br />
*'''WestGrid ownCloud''':<br />
:Information: https://www.westgrid.ca/resources_services/data_storage/cloud_storage<br />
:Youtube video introduction: https://www.youtube.com/watch?time_continue=6&v=szPNNySx_Hk&feature=emb_logo<br />
:Access portal: https://owncloud.westgrid.ca/<br />
*'''Compute Canada NextCloud''':<br />
:https://nextcloud.computecanada.ca<br />
<br />
[[Category:Guides]]</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=Storage_Options&diff=1438Storage Options2021-07-09T14:22:16Z<p>Tthomas: added link for backing up ARC data to OneDrive</p>
<hr />
<div>There are a few options researchers can take advantage of when storing their research data. <br />
<br />
== Data Classification ==<br />
Please review the different data classifications that are outlined by the [https://ucalgary.ca/legal-services/sites/default/files/teams/1/Standards-Legal-Information-Security-Classification-Standard.pdf Information Security Classification Standard]. There are 4 levels of data classification which are summarized in the table below.<br />
<br />
{| class="wikitable"<br />
! Level<br />
! Description<br />
! Example<br />
|-<br />
| Level 1<br />
| Public<br />
|<br />
* Reference data sets<br />
* Published research data<br />
|-<br />
| Level 2<br />
| Internal<br />
|<br />
* Internal memos<br />
* Unpublished research data<br />
* Anonymized or de-identified human subject data<br />
* Library transactions and journals<br />
|-<br />
| Level 3<br />
| Confidential<br />
|<br />
* Faculty/staff employment applications, personnel files, contact information<br />
* Donor or prospective donor information<br />
* Contracts<br />
* Intellectual property<br />
|-<br />
| Level 4<br />
| Restricted<br />
|<br />
* Patient identifiable health information<br />
* identifiable human subject research data<br />
* information subject to special government requirements<br />
|}<br />
<br />
When selecting a storage option, you must use one that meets or exceeds the rated security classification.<br />
<br />
== Research Data Management ==<br />
We recommend you follow good Research Data Management practices and ensure you have a DMP (Data Management Plan) created to guide your data's lifecycle. DMP Assistant has been created specifically for Canadian scholars and aims to meet any and all Tri-Agency requirements. See: https://assistant.portagenetwork.ca/<br />
<br />
Your DMP can help us support the FAIR (findable, accessible, interoperable and reusable) principles for data management.<br />
<br />
Please consider contacting Libraries and Cultural Resources for assistance. For guidance on general data management and developing a DMP, consult https://library.ucalgary.ca/guides/researchdatamanagement or contact research.data@ucalgary.ca.<br />
<br />
For support using PRISM Dataverse, UofC's institutional data repository, contact digitize@ucalgary.ca.<br />
<br />
If you need to share and preserve your large post-publication data set for a mandated period of time, please visit https://www.frdr-dfdr.ca/repo/ in order to learn more about the national Federated Research Data Repository. <br />
<br />
FRDR aligns with Tri-Agency Principles as a platform for Preservation, Retention and Sharing of research data. see: [http://www.science.gc.ca/eic/site/063.nsf/eng/h_83F7624E.html Tri-Agency Statement of Principles on Digital Data Management]<br />
<br />
== Secure Compute Data Storage (SCDS) ==<br />
Secure Computing Data Storage (SCDS) is a service provided by Research Computing Services that allows researchers to store restricted and confidential data. Collaboration with Level 4 data stored in SCDS is possible using ShareFile, a secure file sharing and collaboration tool by Citrix.<br />
<br />
{| class="wikitable"<br />
! Capacity<br />
| 10 GB or more<br />
|-<br />
! Classification<br />
| Level 4<br />
|-<br />
! Learn More<br />
| Visit [https://it.ucalgary.ca/secure-computing-platform The SCDS Website]<br />
|-<br />
! Request Access<br />
| Visit [https://ucalgary.service-now.com/kb_view.do?sysparm_article=KB0030163 ServiceNow to request access]<br />
|}<br />
<br />
== AcademicFS ==<br />
AcademicFS is a UofC hosted SMB/CIFS storage solution funded and operated by RCS. It is available by request to faculty and staff with active research data.<br />
{| class="wikitable"<br />
! Capacity<br />
| 100GB with quota increases available on request. <br />
|-<br />
! Classification<br />
| Level 1 - 2<br />
|-<br />
! Request Access<br />
| Visit [https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=fe66b3a7db297300897e4b8b0b96199d ServiceNow to request access]<br />
|}<br />
=== Service Description ===<br />
You may use AcademicFS to store your active research data files. AcademicFS is intended to be used as a research group or project share. AcademicFS is available on campus or off campus using the IT supported VPN client. Information on how to download and install the VPN client can be found here: https://ucalgary.service-now.com/it?id=kb_article&sys_id=880e71071381ae006f3afbb2e144b05c (IT account login may be required).<br />
All AcademicFS users must have a UofC IT account.<br />
<br />
=== Data recovery ===<br />
AcademicFS does daily snapshots at a bit past midnight, which it keeps for 30 days. You should be able to recover a deleted file for up to 30 days, if it was in your share overnight. If you create a file and delete it during a day, no snapshot will be available for you to recover. AcademicFS presents backups using the windows OS 'previous versions' functionality. If you are not familiar with using this, or if you are on a Linux or MacOS device, you can request a restore, with Service Now.<br />
<br />
For backup, we replicate changes to a distant data center every hour. The storage hardware which hosts your data is located in the basement of the Math Sciences building and our backup is ~22KM distant, so in case of an on campus disaster, your data should be safe.<br />
<br />
=== Support for AcademicFS ===<br />
If you have questions, please contact the IT Support Centre.<br />
: Mon – Fri: 8:30 am – 5:00 pm; Sat, Sun & holidays: 10:00 am – 2:00 pm.<br />
: Live Chat: ucalgary.ca/it<br />
: Email: itsupport@ucalgary.ca<br />
: Phone: 403.220.5555<br />
: In person: 773 Math Science<br />
<br />
<br />
<br />
== OneDrive for Business ==<br />
OneDrive for Business is a storage solution provided by Microsoft and is available by request to all faculty and staff.<br />
{| class="wikitable"<br />
! Capacity<br />
| 5 TB<br />
|-<br />
! Classification<br />
| Level 1 - 4<br />
|-<br />
! Request Access<br />
| Visit [https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=522b68ebdb83e700897e4b8b0b961997 ServiceNow to request access]<br />
|}<br />
<br />
You may use OneDrive for Business to store your personal and work related files. <br />
Files stored within OneDrive are by default private only to you but has the option to allow sharing and collaboration with others. <br />
OneDrive for Business cannot be used as a department or project share space.<br />
There is no group/lab offering with OneDrive. <br />
<br />
<br />
While OneDrive provides a secure/compliant location from an IT Security stand point, <br />
it’s not the most adequate location for data the PI is accountable for 5 years upon completion of the study. <br />
This is not a security issue, but a data management issue.<br />
OneDrive could be great for short term needs, <br />
but '''SCDS''' (see above) is the best option to store all research related electronic records for the long run. <br />
For example, if a study was using a personal OneDrive of one of the researchers to store all the records, <br />
and the researcher was to leave the university, this OneDrive would be gone in 30 days. <br />
That would never happen with SCDS where we are able to track discrete containers which remain associated to the PI and the REB project as one entity.<br />
<br />
<br />
MS has an automation capability for their O365 products.<br />
If you have a windows OS machine, you can use the automation product ‘Flow’ to copy a file to a local file system when a new file is created on OneDrive.<br />
<br />
To back up data residing on ARC to your personal OneDrive allocation please see: [[How to transfer data#rsync]]<br />
<br />
OneDrive requires Multi-Factor Authentication (MFA) enabled on your University of Calgary IT account. <br />
<br />
<br />
UofC OneDrive data is reportedly hosted in Canada (Markham Ont).<br />
<br />
===Request Access===<br />
To request for OneDrive for Business:<br />
#Submit your request on ServiceNow using the OneDrive for Business request form (https://ucalgary.service-now.com/it?id=sc_cat_item&sys_id=522b68ebdb83e700897e4b8b0b961997)<br />
#The IT Support Centre will contact you<br />
#Set a time with IT Support Centre to turn on MFA<br />
# Turn on MFA<br />
#Turn on OneDrive for Business<br />
<br />
===Support for OneDrive for Business===<br />
If you have questions, please contact the IT Support Centre.<br />
: Mon – Fri: 8:30 am – 5:00 pm; Sat, Sun & holidays: 10:00 am – 2:00 pm.<br />
:Live Chat: ucalgary.ca/it<br />
:Email: itsupport@ucalgary.ca<br />
: Phone: 403.220.5555<br />
:In person: 773 Math Science<br />
<br />
===Data recovery===<br />
<br />
===Other Resources===<br />
For more information on OneDrive for Business:<br />
* Operating Level of Agreement KB0032404 (https://ucalgary.service-now.com/it?id=kb_article&sys_id=7f57bddcdb56a3047cab5068dc9619b6)<br />
*OneDrive for Business Getting Started KB0032351 (https://ucalgary.service-now.com/it?id=kb_article&sys_id=60994170db2da7487cab5068dc961900)<br />
<br />
Any questions regarding if data hosted on OneDrive is subject to US jurisdiction discovery or access should be directed to:<br />
*https://cumming.ucalgary.ca/research-institutes/csm-research-services/legal-research-services (CSM researchers.)<br />
*https://research.ucalgary.ca/contact/research-services (Not CSM Researchers)<br />
*https://www.ucalgary.ca/legalservices/ (for teaching/learning – non research enquiries that make their way to you)<br />
<br />
==Office365 SharePoint for research groups==<br />
<br />
To be determined....<br />
<br />
Researchers will be able to request an Office 365 SharePoint site for a group at some point in the future <br />
which could be considered a group cloud sharing platform.<br />
<br />
==Personal storage options==<br />
For personal or level 1 data, you may use external solutions from WestGrid or Compute Canada.<br />
<br />
*'''WestGrid ownCloud''':<br />
:Information: https://www.westgrid.ca/resources_services/data_storage/cloud_storage<br />
:Youtube video introduction: https://www.youtube.com/watch?time_continue=6&v=szPNNySx_Hk&feature=emb_logo<br />
:Access portal: https://owncloud.westgrid.ca/<br />
*'''Compute Canada NextCloud''':<br />
:https://nextcloud.computecanada.ca<br />
<br />
[[Category:Guides]]</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=How_to_transfer_data&diff=1352How to transfer data2021-06-02T22:04:38Z<p>Tthomas: /* rclone: rsync for cloud storage */</p>
<hr />
<div>{{Message Box<br />
|title=Data Transfer Nodes<br />
|message=<br />
For performance and resource reasons, file transfers should be performed on the data transfer node arc-dtn.ucalgary.ca rather than on the the ARC login node. Since the ARC DTN has the same shares as ARC, any files you transfer to the DTN will also be available on ARC.<br />
}}<br />
<br />
<br />
= Command Line File Transfer Tools =<br />
You may use the following command-line file transfer utilities on Linux, MacOS, and Windows. File transfers using these methods require your computer to be on the University of Calgary campus network or via the University of Calgary IT General VPN.<br />
<br />
If you are working on a Windows computer, you will need to install these utilities separately as they are not installed by default. Newer versions of Windows 10 (1903 and up) have '''SSH''' built-in as part of the '''openssh''' package. However, you may be better off using one of the [[#GUI File Transfer]] tools listed in the following section.<br />
<br />
== <code>scp</code>: Secure Copy ==<br />
<code>scp</code> is a secure and encrypted method of transferring files between machines via SSH. It is available on Linux and Mac computers by default and can be installed on Windows by installing the OpenSSH package.<br />
<br />
The general format for the command is:<br />
$ scp [options] source destination<br />
<br />
* The <code>source</code> and <code>destination</code> fields can be a local file / directory or a remote one.<br />
* The ''local location'' is a normal Unix path, absolute or relative and <br />
* The ''remote location'' has a format <code>username@remote.system.name:file/path</code>.<br />
* The ''remote relative file path'' is relative to the home directory of the <code>username</code> on the remote system.<br />
<br />
You may see all the available options with <code>scp</code> by viewing the [http://man7.org/linux/man-pages/man1/scp.1.html man page].<br />
<br />
=== Example Usage ===<br />
Common operations are given below. On your desktop, to:<br />
* Transfer a single file (eg. <code>data.dat</code>) to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp data.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Transfer all files ending with <code>.dat</code> to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp *.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* To transfer an entire directory to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp -r my_data_directory/ username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
<br />
== <code>rsync</code> ==<br />
<code>rsync</code> is a utility for transferring and synchronizing files efficiently. The efficiency for its file synchronization is achieved by its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. <br />
<br />
<code>rsync</code> can be used to copy files and directories locally on a system or between two computers via SSH. Unlike <code>scp</code>. Because it is designed to synchronize two locations, partial transfers can be restarted by re-running <code>rsync</code> without losing progress. Resuming a partial transfer is not possible with <code>scp</code>.<br />
<br />
The general format for the command is similar to '''scp''':<br />
$ rsync [options] source destination<br />
* The <code>source</code> and <code>destination</code> fields can be a local file / directory or a remote one.<br />
* <code>rsync</code> '''cannot''' transfer files between '''two remote''' locations. The source or the destination must be a local path.<br />
* The ''local location'' is a normal Unix path, absolute or relative and <br />
* The ''remote location'' has a format <code>username@remote.system.name:file/path</code>.<br />
* The ''remote relative file path'' is relative to the home directory of the <code>username</code> on the remote system.<br />
<br />
You may see all the available options with <code>rsync</code> by viewing the [http://man7.org/linux/man-pages/man1/rsync.1.html man page].<br />
<br />
=== Example Usage ===<br />
Common operations are given below. On your desktop, to:<br />
<br />
* Upload a single file (eg. <code>data.dat</code>) from your workstation to your ARC: <syntaxhighlight lang="bash"><br />
desktop$ rsync -v data.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Upload all files matching a wildcard (eg. ending in <code>*.dat</code>): <syntaxhighlight lang="bash"><br />
$ rsync -v *.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Upload an entire directory (eg. <code>my_data</code> to <code>~/projects/project2</code>): <syntaxhighlight lang="bash"><br />
$ rsync -axv my_data username@arc-dtn.ucalgary.ca:~projects/project2/<br />
</syntaxhighlight><br />
* Upload more than one directory: <syntaxhighlight lang="bash"><br />
desktop$ rsync -axv my_data1 my_data2 my_data3 username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Download one file (eg. <code>output.dat</code>) from ARC to the current directory on your workstation: <syntaxhighlight lang="bash"><br />
## Note the '.' at the end of the command which references the current working directory on your computer<br />
desktop$ rsync -v username@arc-dtn.ucalgary.ca:projects/project1/output.dat .<br />
</syntaxhighlight><br />
* Download one directory (eg. <code>outputs</code>) from ARC to the current directory on your workstation:<syntaxhighlight lang="bash"><br />
desktop$ rsync -axv username@arc-dtn.ucalgary.ca:projects/project1/outputs .<br />
</syntaxhighlight><br />
<br />
== <code>sftp</code>: secure file transfer protocol ==<br />
<br />
* Manual page on-line: http://man7.org/linux/man-pages/man1/sftp.1.html<br />
<br />
<br />
'''sftp''' is a file transfer program, similar to '''ftp''', <br />
which performs all operations over an encrypted '''ssh''' transport. <br />
It may also use many features of '''ssh''', such as public key authentication and compression.<br />
<br />
<br />
'''sftp''' has an interactive mode, <br />
in which sftp understands a set of commands similar to those of '''ftp'''.<br />
Commands are case insensitive.<br />
<br />
== <code>rclone</code>: rsync for cloud storage ==<br />
'''Rclone''' is a command line program to sync files and directories to and from a number of on-line storage services.<br />
<br />
* https://rclone.org/<br />
You may backup your data from arc-dtn to your personal 5TB UCalgary OneDrive to create a safe second copy at a distance.<br />
<br />
[https://rcs.ucalgary.ca/images/8/8e/Rclone_and_OneDrive_on_arc.pdf detailed rclone configuration instructions]<br />
<br />
Please note, if you are syncing your OneDrive with a PC or Mac, your new backup of arc home may be auto-replicated to your computer. You may choose to not replicate using the PC or Mac OneDrive client (help & settings -> settings -> account -> Choose folders) .<br />
<br />
== <code>curl</code> and <code>wget</code>: downloading from the internet ==<br />
To download a file with the ability to resume a partial download:<br />
curl -c http://example.com/resource.tar.gz -O <br />
wget -c http://example.com/resource.tar.gz<br />
<br />
<br />
= Graphical File Transfer Tools =<br />
== FileZilla ==<br />
FileZilla is a free cross-platform file transfer program that can transfer files via FTP and SFTP. <br />
<br />
=== Installation ===<br />
You may obtain Filezilla from the project's official website at: https://filezilla-project.org/download.php?type=client. '''Please note''': The official installer may bundle ads and unwanted software. Be careful when clicking through.<br />
<br />
Alternatively, you may obtain Filezilla from Ninite: https://ninite.com/filezilla<br />
<br />
=== Connecting to ARC ===<br />
If working off campus, first connect to the University of Calgary General VPN. Open Filezilla and connect to <code>arc-dtn.ucalgary.ca</code> on port <code>22</code>.<br />
[[File:Filezilla.jpg|alt=Connecting to ARC using Filezilla|none|thumb|Connecting to ARC using Filezilla]] <br />
<br />
== MobaXterm (Windows) ==<br />
'''MobaXterm''' is the recommended tool for remote access and data transfer in '''Windows''' OSes.<br />
<br />
MobaXterm is a one-stop solution for most remote access work on a compute cluster or a Unix / Linux server.<br />
<br />
It provides many Unix like utilities for Windows including an '''SSH''' client and '''X11''' graphics server. It provides a graphical interface for <br />
data transfer operations.<br />
<br />
* Website: https://mobaxterm.mobatek.net/<br />
<br />
== WinSCP (Windows) ==<br />
WinSCP is a free Windows file transfer tool.<br />
<br />
https://winscp.net/eng/index.php<br />
<br />
= Cloud based Fire Transfer Services =<br />
== Globus File Transfer ==<br />
Globus File Transfer is a cloud based service for file transfer and file sharing. It uses GridFTP for high speed and reliable data transfers<br><br />
<br />
=== How to get started ===<br />
<br />
# Navigate to the web page https://www.globusid.org <br />
# Create a Globus ID and password using your google account.<br />
<br />
=== Use Globus Web Application to transfer files ===<br />
<br />
# To initiate data transfers using the Globus Web Application, navigate to https://www.globusid.org/login and log into your Globus account <br />
# On the left panel, click on File Manager to define/select the endpoints. For example, to transfer data from ARC cluster (endpoint 1) to Compute Canada cedar cluster (endpoint 2) <br />
#* Under collection, for ARC data transfer node choose 'endpoint 1' as 'ucalgary#arc-dtn.ucalgary.ca' from the drop down menu. Authenticate your access using UCalgary IT credentials. This will bring you to the home directory on ARC. <br />
#* Next, for Compute Canada cedar data transfer node choose 'endpoint 2' as 'computecanada#cedar-dtn' from the drop down menu. Again authenticate your access using Compute Canada credentials. Navigate to the location where you want to transfer the file. <br />
#* Select the file to be transferred from 'endpoint 1' and initiate the transfer process. <br />
<br />
=== Use Globus Web Application to Share Files ===<br />
You may grant access to a folder in your allocation on ARC to be used for uploading or sharing files with any individual with a globus account, either through ComputeCanada or their own institution. <br />
<br />
Please see https://docs.globus.org/how-to/share-files/<br />
<br />
Here is a screen capture with an example using arc-dtn.ucalgary.ca <br />
[[File:DmitriFolder3.png|none|thumb|645x645px|The user tthomas created a folder on ARC:/home/tthomas/dmitriFolder using the Globus Web application and shared it with individuals at external organizations.]]<br />
<br />
<br />
<br />
tannistha.nandi@gmail has read and write access, ipercel@computecanada has read only access.<br />
<br />
The external identity rozmanov@globusid.org has been granted the ability to assist with adding/administering access.<br />
[[File:DmitriFolder2.png|none|thumb|577x577px|please note: the owner of an ARC allocation is accountable for any activities of the individuals with access granted via GLOBUS. ]]<br />
<br />
<br />
rozmanov@globus.org may add research collaborators to this allocation on ARC in order to share data.<br />
<br />
= Special cases =<br />
<br />
== Transferring Large Datasets ==<br />
=== Using screen and rsync ===<br />
If you want to transfer a large amount of data from a remote Unix system to ARC you can use ''<code>rsync</code>'' to handle the transfer. However, you will have to keep your SSH session from your workstation connected during the entire transfer which is often not convenient or not feasible. <br />
<br />
To overcome this one can run the '''rsync''' transfer inside a '''screen''' virtual session on ARC. '''screen''' creates an SSH session local to ARC and allows for reconnection from SSH sessions from your workstation.<br />
<br />
To begin, login to ARC and start <code>screen</code> with the screen command:<br />
<pre><br />
# Login to ARC<br />
$ ssh username@arc.ucalgary.ca<br />
<br />
# Start a screen session<br />
$ screen<br />
<br />
# While in the new screen session, start the transfer with rsync.<br />
$ rsync -axv ext_user@external.system:path/to/remote/data .<br />
<br />
# Disconnect from the screen session with the hotkey 'Ctrl-a d'<br />
# You may now disconnect from ARC or close the lid of you laptop or turn off the computer.<br />
</pre><br />
<br />
To check if the transfer has been finished.<br />
<pre><br />
# Login to ARC<br />
$ ssh username@arc.ucalgary.ca<br />
<br />
# Reconnect to the screen session<br />
$ screen -r<br />
<br />
# If the transfer has been finished close the screen session.<br />
$ exit<br />
<br />
# If the transfer is still running, disconnect from the screen session with the hotkey 'Ctrl-a d'<br />
</pre><br />
<br />
=== Very large files ===<br />
If the files are large and the transfer speed is low the transfer may fail before the file has been transferred. <br />
'''rsync''' may not be of help here, as it will not restart the file transfer (have not tested recently).<br />
<br />
The solution may be to split the large file into smaller chunks, transfer them using rsync and then join them on the remote system (ARC for example):<br />
<br />
<pre><br />
# Large file is 506MB in this example.<br />
$ ls -l t.bin<br />
-rw-r--r-- 1 drozmano drozmano 530308481 Jun 8 11:06 t.bin<br />
<br />
# split the file:<br />
$ split -b 100M t.bin t.bin_chunk.<br />
<br />
# Check the chunks.<br />
$ ls -l t.bin_chunk.*<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.aa<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ab<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ac<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ad<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ae<br />
-rw-r--r-- 1 drozmano drozmano 6020481 Jun 8 11:09 t.bin_chunk.af<br />
<br />
# Transfer the files:<br />
$ rsync -axv t.bin_chunks.* username@arc.ucalgary.ca:<br />
</pre><br />
<br />
Then login to ARC and join the files:<br />
<pre><br />
$ cat t.bin_chunk.* > t.bin<br />
<br />
$ ls -l <br />
-rw-r--r-- 1 drozmano drozmano 530308481 Jun 8 11:06 t.bin<br />
</pre><br />
Success.<br />
<br />
[[Category:Guides]]</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=How_to_transfer_data&diff=1351How to transfer data2021-06-02T21:32:57Z<p>Tthomas: /* rclone: rsync for cloud storage */</p>
<hr />
<div>{{Message Box<br />
|title=Data Transfer Nodes<br />
|message=<br />
For performance and resource reasons, file transfers should be performed on the data transfer node arc-dtn.ucalgary.ca rather than on the the ARC login node. Since the ARC DTN has the same shares as ARC, any files you transfer to the DTN will also be available on ARC.<br />
}}<br />
<br />
<br />
= Command Line File Transfer Tools =<br />
You may use the following command-line file transfer utilities on Linux, MacOS, and Windows. File transfers using these methods require your computer to be on the University of Calgary campus network or via the University of Calgary IT General VPN.<br />
<br />
If you are working on a Windows computer, you will need to install these utilities separately as they are not installed by default. Newer versions of Windows 10 (1903 and up) have '''SSH''' built-in as part of the '''openssh''' package. However, you may be better off using one of the [[#GUI File Transfer]] tools listed in the following section.<br />
<br />
== <code>scp</code>: Secure Copy ==<br />
<code>scp</code> is a secure and encrypted method of transferring files between machines via SSH. It is available on Linux and Mac computers by default and can be installed on Windows by installing the OpenSSH package.<br />
<br />
The general format for the command is:<br />
$ scp [options] source destination<br />
<br />
* The <code>source</code> and <code>destination</code> fields can be a local file / directory or a remote one.<br />
* The ''local location'' is a normal Unix path, absolute or relative and <br />
* The ''remote location'' has a format <code>username@remote.system.name:file/path</code>.<br />
* The ''remote relative file path'' is relative to the home directory of the <code>username</code> on the remote system.<br />
<br />
You may see all the available options with <code>scp</code> by viewing the [http://man7.org/linux/man-pages/man1/scp.1.html man page].<br />
<br />
=== Example Usage ===<br />
Common operations are given below. On your desktop, to:<br />
* Transfer a single file (eg. <code>data.dat</code>) to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp data.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Transfer all files ending with <code>.dat</code> to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp *.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* To transfer an entire directory to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp -r my_data_directory/ username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
<br />
== <code>rsync</code> ==<br />
<code>rsync</code> is a utility for transferring and synchronizing files efficiently. The efficiency for its file synchronization is achieved by its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. <br />
<br />
<code>rsync</code> can be used to copy files and directories locally on a system or between two computers via SSH. Unlike <code>scp</code>. Because it is designed to synchronize two locations, partial transfers can be restarted by re-running <code>rsync</code> without losing progress. Resuming a partial transfer is not possible with <code>scp</code>.<br />
<br />
The general format for the command is similar to '''scp''':<br />
$ rsync [options] source destination<br />
* The <code>source</code> and <code>destination</code> fields can be a local file / directory or a remote one.<br />
* <code>rsync</code> '''cannot''' transfer files between '''two remote''' locations. The source or the destination must be a local path.<br />
* The ''local location'' is a normal Unix path, absolute or relative and <br />
* The ''remote location'' has a format <code>username@remote.system.name:file/path</code>.<br />
* The ''remote relative file path'' is relative to the home directory of the <code>username</code> on the remote system.<br />
<br />
You may see all the available options with <code>rsync</code> by viewing the [http://man7.org/linux/man-pages/man1/rsync.1.html man page].<br />
<br />
=== Example Usage ===<br />
Common operations are given below. On your desktop, to:<br />
<br />
* Upload a single file (eg. <code>data.dat</code>) from your workstation to your ARC: <syntaxhighlight lang="bash"><br />
desktop$ rsync -v data.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Upload all files matching a wildcard (eg. ending in <code>*.dat</code>): <syntaxhighlight lang="bash"><br />
$ rsync -v *.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Upload an entire directory (eg. <code>my_data</code> to <code>~/projects/project2</code>): <syntaxhighlight lang="bash"><br />
$ rsync -axv my_data username@arc-dtn.ucalgary.ca:~projects/project2/<br />
</syntaxhighlight><br />
* Upload more than one directory: <syntaxhighlight lang="bash"><br />
desktop$ rsync -axv my_data1 my_data2 my_data3 username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Download one file (eg. <code>output.dat</code>) from ARC to the current directory on your workstation: <syntaxhighlight lang="bash"><br />
## Note the '.' at the end of the command which references the current working directory on your computer<br />
desktop$ rsync -v username@arc-dtn.ucalgary.ca:projects/project1/output.dat .<br />
</syntaxhighlight><br />
* Download one directory (eg. <code>outputs</code>) from ARC to the current directory on your workstation:<syntaxhighlight lang="bash"><br />
desktop$ rsync -axv username@arc-dtn.ucalgary.ca:projects/project1/outputs .<br />
</syntaxhighlight><br />
<br />
== <code>sftp</code>: secure file transfer protocol ==<br />
<br />
* Manual page on-line: http://man7.org/linux/man-pages/man1/sftp.1.html<br />
<br />
<br />
'''sftp''' is a file transfer program, similar to '''ftp''', <br />
which performs all operations over an encrypted '''ssh''' transport. <br />
It may also use many features of '''ssh''', such as public key authentication and compression.<br />
<br />
<br />
'''sftp''' has an interactive mode, <br />
in which sftp understands a set of commands similar to those of '''ftp'''.<br />
Commands are case insensitive.<br />
<br />
== <code>rclone</code>: rsync for cloud storage ==<br />
'''Rclone''' is a command line program to sync files and directories to and from a number of on-line storage services.<br />
<br />
* https://rclone.org/<br />
You may backup your data from arc-dtn to your personal 5TB UCalgary OneDrive to create a safe second copy at a distance.<br />
<br />
[https://rcs.ucalgary.ca/images/8/8e/Rclone_and_OneDrive_on_arc.pdf detailed rclone configuration instructions]<br />
<br />
== <code>curl</code> and <code>wget</code>: downloading from the internet ==<br />
To download a file with the ability to resume a partial download:<br />
curl -c http://example.com/resource.tar.gz -O <br />
wget -c http://example.com/resource.tar.gz<br />
<br />
<br />
= Graphical File Transfer Tools =<br />
== FileZilla ==<br />
FileZilla is a free cross-platform file transfer program that can transfer files via FTP and SFTP. <br />
<br />
=== Installation ===<br />
You may obtain Filezilla from the project's official website at: https://filezilla-project.org/download.php?type=client. '''Please note''': The official installer may bundle ads and unwanted software. Be careful when clicking through.<br />
<br />
Alternatively, you may obtain Filezilla from Ninite: https://ninite.com/filezilla<br />
<br />
=== Connecting to ARC ===<br />
If working off campus, first connect to the University of Calgary General VPN. Open Filezilla and connect to <code>arc-dtn.ucalgary.ca</code> on port <code>22</code>.<br />
[[File:Filezilla.jpg|alt=Connecting to ARC using Filezilla|none|thumb|Connecting to ARC using Filezilla]] <br />
<br />
== MobaXterm (Windows) ==<br />
'''MobaXterm''' is the recommended tool for remote access and data transfer in '''Windows''' OSes.<br />
<br />
MobaXterm is a one-stop solution for most remote access work on a compute cluster or a Unix / Linux server.<br />
<br />
It provides many Unix like utilities for Windows including an '''SSH''' client and '''X11''' graphics server. It provides a graphical interface for <br />
data transfer operations.<br />
<br />
* Website: https://mobaxterm.mobatek.net/<br />
<br />
== WinSCP (Windows) ==<br />
WinSCP is a free Windows file transfer tool.<br />
<br />
https://winscp.net/eng/index.php<br />
<br />
= Cloud based Fire Transfer Services =<br />
== Globus File Transfer ==<br />
Globus File Transfer is a cloud based service for file transfer and file sharing. It uses GridFTP for high speed and reliable data transfers<br><br />
<br />
=== How to get started ===<br />
<br />
# Navigate to the web page https://www.globusid.org <br />
# Create a Globus ID and password using your google account.<br />
<br />
=== Use Globus Web Application to transfer files ===<br />
<br />
# To initiate data transfers using the Globus Web Application, navigate to https://www.globusid.org/login and log into your Globus account <br />
# On the left panel, click on File Manager to define/select the endpoints. For example, to transfer data from ARC cluster (endpoint 1) to Compute Canada cedar cluster (endpoint 2) <br />
#* Under collection, for ARC data transfer node choose 'endpoint 1' as 'ucalgary#arc-dtn.ucalgary.ca' from the drop down menu. Authenticate your access using UCalgary IT credentials. This will bring you to the home directory on ARC. <br />
#* Next, for Compute Canada cedar data transfer node choose 'endpoint 2' as 'computecanada#cedar-dtn' from the drop down menu. Again authenticate your access using Compute Canada credentials. Navigate to the location where you want to transfer the file. <br />
#* Select the file to be transferred from 'endpoint 1' and initiate the transfer process. <br />
<br />
=== Use Globus Web Application to Share Files ===<br />
You may grant access to a folder in your allocation on ARC to be used for uploading or sharing files with any individual with a globus account, either through ComputeCanada or their own institution. <br />
<br />
Please see https://docs.globus.org/how-to/share-files/<br />
<br />
Here is a screen capture with an example using arc-dtn.ucalgary.ca <br />
[[File:DmitriFolder3.png|none|thumb|645x645px|The user tthomas created a folder on ARC:/home/tthomas/dmitriFolder using the Globus Web application and shared it with individuals at external organizations.]]<br />
<br />
<br />
<br />
tannistha.nandi@gmail has read and write access, ipercel@computecanada has read only access.<br />
<br />
The external identity rozmanov@globusid.org has been granted the ability to assist with adding/administering access.<br />
[[File:DmitriFolder2.png|none|thumb|577x577px|please note: the owner of an ARC allocation is accountable for any activities of the individuals with access granted via GLOBUS. ]]<br />
<br />
<br />
rozmanov@globus.org may add research collaborators to this allocation on ARC in order to share data.<br />
<br />
= Special cases =<br />
<br />
== Transferring Large Datasets ==<br />
=== Using screen and rsync ===<br />
If you want to transfer a large amount of data from a remote Unix system to ARC you can use ''<code>rsync</code>'' to handle the transfer. However, you will have to keep your SSH session from your workstation connected during the entire transfer which is often not convenient or not feasible. <br />
<br />
To overcome this one can run the '''rsync''' transfer inside a '''screen''' virtual session on ARC. '''screen''' creates an SSH session local to ARC and allows for reconnection from SSH sessions from your workstation.<br />
<br />
To begin, login to ARC and start <code>screen</code> with the screen command:<br />
<pre><br />
# Login to ARC<br />
$ ssh username@arc.ucalgary.ca<br />
<br />
# Start a screen session<br />
$ screen<br />
<br />
# While in the new screen session, start the transfer with rsync.<br />
$ rsync -axv ext_user@external.system:path/to/remote/data .<br />
<br />
# Disconnect from the screen session with the hotkey 'Ctrl-a d'<br />
# You may now disconnect from ARC or close the lid of you laptop or turn off the computer.<br />
</pre><br />
<br />
To check if the transfer has been finished.<br />
<pre><br />
# Login to ARC<br />
$ ssh username@arc.ucalgary.ca<br />
<br />
# Reconnect to the screen session<br />
$ screen -r<br />
<br />
# If the transfer has been finished close the screen session.<br />
$ exit<br />
<br />
# If the transfer is still running, disconnect from the screen session with the hotkey 'Ctrl-a d'<br />
</pre><br />
<br />
=== Very large files ===<br />
If the files are large and the transfer speed is low the transfer may fail before the file has been transferred. <br />
'''rsync''' may not be of help here, as it will not restart the file transfer (have not tested recently).<br />
<br />
The solution may be to split the large file into smaller chunks, transfer them using rsync and then join them on the remote system (ARC for example):<br />
<br />
<pre><br />
# Large file is 506MB in this example.<br />
$ ls -l t.bin<br />
-rw-r--r-- 1 drozmano drozmano 530308481 Jun 8 11:06 t.bin<br />
<br />
# split the file:<br />
$ split -b 100M t.bin t.bin_chunk.<br />
<br />
# Check the chunks.<br />
$ ls -l t.bin_chunk.*<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.aa<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ab<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ac<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ad<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ae<br />
-rw-r--r-- 1 drozmano drozmano 6020481 Jun 8 11:09 t.bin_chunk.af<br />
<br />
# Transfer the files:<br />
$ rsync -axv t.bin_chunks.* username@arc.ucalgary.ca:<br />
</pre><br />
<br />
Then login to ARC and join the files:<br />
<pre><br />
$ cat t.bin_chunk.* > t.bin<br />
<br />
$ ls -l <br />
-rw-r--r-- 1 drozmano drozmano 530308481 Jun 8 11:06 t.bin<br />
</pre><br />
Success.<br />
<br />
[[Category:Guides]]</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=How_to_transfer_data&diff=1350How to transfer data2021-06-02T21:31:32Z<p>Tthomas: /* rclone: rsync for cloud storage */</p>
<hr />
<div>{{Message Box<br />
|title=Data Transfer Nodes<br />
|message=<br />
For performance and resource reasons, file transfers should be performed on the data transfer node arc-dtn.ucalgary.ca rather than on the the ARC login node. Since the ARC DTN has the same shares as ARC, any files you transfer to the DTN will also be available on ARC.<br />
}}<br />
<br />
<br />
= Command Line File Transfer Tools =<br />
You may use the following command-line file transfer utilities on Linux, MacOS, and Windows. File transfers using these methods require your computer to be on the University of Calgary campus network or via the University of Calgary IT General VPN.<br />
<br />
If you are working on a Windows computer, you will need to install these utilities separately as they are not installed by default. Newer versions of Windows 10 (1903 and up) have '''SSH''' built-in as part of the '''openssh''' package. However, you may be better off using one of the [[#GUI File Transfer]] tools listed in the following section.<br />
<br />
== <code>scp</code>: Secure Copy ==<br />
<code>scp</code> is a secure and encrypted method of transferring files between machines via SSH. It is available on Linux and Mac computers by default and can be installed on Windows by installing the OpenSSH package.<br />
<br />
The general format for the command is:<br />
$ scp [options] source destination<br />
<br />
* The <code>source</code> and <code>destination</code> fields can be a local file / directory or a remote one.<br />
* The ''local location'' is a normal Unix path, absolute or relative and <br />
* The ''remote location'' has a format <code>username@remote.system.name:file/path</code>.<br />
* The ''remote relative file path'' is relative to the home directory of the <code>username</code> on the remote system.<br />
<br />
You may see all the available options with <code>scp</code> by viewing the [http://man7.org/linux/man-pages/man1/scp.1.html man page].<br />
<br />
=== Example Usage ===<br />
Common operations are given below. On your desktop, to:<br />
* Transfer a single file (eg. <code>data.dat</code>) to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp data.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Transfer all files ending with <code>.dat</code> to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp *.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* To transfer an entire directory to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp -r my_data_directory/ username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
<br />
== <code>rsync</code> ==<br />
<code>rsync</code> is a utility for transferring and synchronizing files efficiently. The efficiency for its file synchronization is achieved by its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. <br />
<br />
<code>rsync</code> can be used to copy files and directories locally on a system or between two computers via SSH. Unlike <code>scp</code>. Because it is designed to synchronize two locations, partial transfers can be restarted by re-running <code>rsync</code> without losing progress. Resuming a partial transfer is not possible with <code>scp</code>.<br />
<br />
The general format for the command is similar to '''scp''':<br />
$ rsync [options] source destination<br />
* The <code>source</code> and <code>destination</code> fields can be a local file / directory or a remote one.<br />
* <code>rsync</code> '''cannot''' transfer files between '''two remote''' locations. The source or the destination must be a local path.<br />
* The ''local location'' is a normal Unix path, absolute or relative and <br />
* The ''remote location'' has a format <code>username@remote.system.name:file/path</code>.<br />
* The ''remote relative file path'' is relative to the home directory of the <code>username</code> on the remote system.<br />
<br />
You may see all the available options with <code>rsync</code> by viewing the [http://man7.org/linux/man-pages/man1/rsync.1.html man page].<br />
<br />
=== Example Usage ===<br />
Common operations are given below. On your desktop, to:<br />
<br />
* Upload a single file (eg. <code>data.dat</code>) from your workstation to your ARC: <syntaxhighlight lang="bash"><br />
desktop$ rsync -v data.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Upload all files matching a wildcard (eg. ending in <code>*.dat</code>): <syntaxhighlight lang="bash"><br />
$ rsync -v *.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Upload an entire directory (eg. <code>my_data</code> to <code>~/projects/project2</code>): <syntaxhighlight lang="bash"><br />
$ rsync -axv my_data username@arc-dtn.ucalgary.ca:~projects/project2/<br />
</syntaxhighlight><br />
* Upload more than one directory: <syntaxhighlight lang="bash"><br />
desktop$ rsync -axv my_data1 my_data2 my_data3 username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Download one file (eg. <code>output.dat</code>) from ARC to the current directory on your workstation: <syntaxhighlight lang="bash"><br />
## Note the '.' at the end of the command which references the current working directory on your computer<br />
desktop$ rsync -v username@arc-dtn.ucalgary.ca:projects/project1/output.dat .<br />
</syntaxhighlight><br />
* Download one directory (eg. <code>outputs</code>) from ARC to the current directory on your workstation:<syntaxhighlight lang="bash"><br />
desktop$ rsync -axv username@arc-dtn.ucalgary.ca:projects/project1/outputs .<br />
</syntaxhighlight><br />
<br />
== <code>sftp</code>: secure file transfer protocol ==<br />
<br />
* Manual page on-line: http://man7.org/linux/man-pages/man1/sftp.1.html<br />
<br />
<br />
'''sftp''' is a file transfer program, similar to '''ftp''', <br />
which performs all operations over an encrypted '''ssh''' transport. <br />
It may also use many features of '''ssh''', such as public key authentication and compression.<br />
<br />
<br />
'''sftp''' has an interactive mode, <br />
in which sftp understands a set of commands similar to those of '''ftp'''.<br />
Commands are case insensitive.<br />
<br />
== <code>rclone</code>: rsync for cloud storage ==<br />
'''Rclone''' is a command line program to sync files and directories to and from a number of on-line storage services.<br />
<br />
* https://rclone.org/<br />
You may backup your data from arc-dtn to your ucalgary OneDrive to create a safe second copy at a distance.<br />
<br />
[https://rcs.ucalgary.ca/images/8/8e/Rclone_and_OneDrive_on_arc.pdf rclone configuration instructions]<br />
<br />
== <code>curl</code> and <code>wget</code>: downloading from the internet ==<br />
To download a file with the ability to resume a partial download:<br />
curl -c http://example.com/resource.tar.gz -O <br />
wget -c http://example.com/resource.tar.gz<br />
<br />
<br />
= Graphical File Transfer Tools =<br />
== FileZilla ==<br />
FileZilla is a free cross-platform file transfer program that can transfer files via FTP and SFTP. <br />
<br />
=== Installation ===<br />
You may obtain Filezilla from the project's official website at: https://filezilla-project.org/download.php?type=client. '''Please note''': The official installer may bundle ads and unwanted software. Be careful when clicking through.<br />
<br />
Alternatively, you may obtain Filezilla from Ninite: https://ninite.com/filezilla<br />
<br />
=== Connecting to ARC ===<br />
If working off campus, first connect to the University of Calgary General VPN. Open Filezilla and connect to <code>arc-dtn.ucalgary.ca</code> on port <code>22</code>.<br />
[[File:Filezilla.jpg|alt=Connecting to ARC using Filezilla|none|thumb|Connecting to ARC using Filezilla]] <br />
<br />
== MobaXterm (Windows) ==<br />
'''MobaXterm''' is the recommended tool for remote access and data transfer in '''Windows''' OSes.<br />
<br />
MobaXterm is a one-stop solution for most remote access work on a compute cluster or a Unix / Linux server.<br />
<br />
It provides many Unix like utilities for Windows including an '''SSH''' client and '''X11''' graphics server. It provides a graphical interface for <br />
data transfer operations.<br />
<br />
* Website: https://mobaxterm.mobatek.net/<br />
<br />
== WinSCP (Windows) ==<br />
WinSCP is a free Windows file transfer tool.<br />
<br />
https://winscp.net/eng/index.php<br />
<br />
= Cloud based Fire Transfer Services =<br />
== Globus File Transfer ==<br />
Globus File Transfer is a cloud based service for file transfer and file sharing. It uses GridFTP for high speed and reliable data transfers<br><br />
<br />
=== How to get started ===<br />
<br />
# Navigate to the web page https://www.globusid.org <br />
# Create a Globus ID and password using your google account.<br />
<br />
=== Use Globus Web Application to transfer files ===<br />
<br />
# To initiate data transfers using the Globus Web Application, navigate to https://www.globusid.org/login and log into your Globus account <br />
# On the left panel, click on File Manager to define/select the endpoints. For example, to transfer data from ARC cluster (endpoint 1) to Compute Canada cedar cluster (endpoint 2) <br />
#* Under collection, for ARC data transfer node choose 'endpoint 1' as 'ucalgary#arc-dtn.ucalgary.ca' from the drop down menu. Authenticate your access using UCalgary IT credentials. This will bring you to the home directory on ARC. <br />
#* Next, for Compute Canada cedar data transfer node choose 'endpoint 2' as 'computecanada#cedar-dtn' from the drop down menu. Again authenticate your access using Compute Canada credentials. Navigate to the location where you want to transfer the file. <br />
#* Select the file to be transferred from 'endpoint 1' and initiate the transfer process. <br />
<br />
=== Use Globus Web Application to Share Files ===<br />
You may grant access to a folder in your allocation on ARC to be used for uploading or sharing files with any individual with a globus account, either through ComputeCanada or their own institution. <br />
<br />
Please see https://docs.globus.org/how-to/share-files/<br />
<br />
Here is a screen capture with an example using arc-dtn.ucalgary.ca <br />
[[File:DmitriFolder3.png|none|thumb|645x645px|The user tthomas created a folder on ARC:/home/tthomas/dmitriFolder using the Globus Web application and shared it with individuals at external organizations.]]<br />
<br />
<br />
<br />
tannistha.nandi@gmail has read and write access, ipercel@computecanada has read only access.<br />
<br />
The external identity rozmanov@globusid.org has been granted the ability to assist with adding/administering access.<br />
[[File:DmitriFolder2.png|none|thumb|577x577px|please note: the owner of an ARC allocation is accountable for any activities of the individuals with access granted via GLOBUS. ]]<br />
<br />
<br />
rozmanov@globus.org may add research collaborators to this allocation on ARC in order to share data.<br />
<br />
= Special cases =<br />
<br />
== Transferring Large Datasets ==<br />
=== Using screen and rsync ===<br />
If you want to transfer a large amount of data from a remote Unix system to ARC you can use ''<code>rsync</code>'' to handle the transfer. However, you will have to keep your SSH session from your workstation connected during the entire transfer which is often not convenient or not feasible. <br />
<br />
To overcome this one can run the '''rsync''' transfer inside a '''screen''' virtual session on ARC. '''screen''' creates an SSH session local to ARC and allows for reconnection from SSH sessions from your workstation.<br />
<br />
To begin, login to ARC and start <code>screen</code> with the screen command:<br />
<pre><br />
# Login to ARC<br />
$ ssh username@arc.ucalgary.ca<br />
<br />
# Start a screen session<br />
$ screen<br />
<br />
# While in the new screen session, start the transfer with rsync.<br />
$ rsync -axv ext_user@external.system:path/to/remote/data .<br />
<br />
# Disconnect from the screen session with the hotkey 'Ctrl-a d'<br />
# You may now disconnect from ARC or close the lid of you laptop or turn off the computer.<br />
</pre><br />
<br />
To check if the transfer has been finished.<br />
<pre><br />
# Login to ARC<br />
$ ssh username@arc.ucalgary.ca<br />
<br />
# Reconnect to the screen session<br />
$ screen -r<br />
<br />
# If the transfer has been finished close the screen session.<br />
$ exit<br />
<br />
# If the transfer is still running, disconnect from the screen session with the hotkey 'Ctrl-a d'<br />
</pre><br />
<br />
=== Very large files ===<br />
If the files are large and the transfer speed is low the transfer may fail before the file has been transferred. <br />
'''rsync''' may not be of help here, as it will not restart the file transfer (have not tested recently).<br />
<br />
The solution may be to split the large file into smaller chunks, transfer them using rsync and then join them on the remote system (ARC for example):<br />
<br />
<pre><br />
# Large file is 506MB in this example.<br />
$ ls -l t.bin<br />
-rw-r--r-- 1 drozmano drozmano 530308481 Jun 8 11:06 t.bin<br />
<br />
# split the file:<br />
$ split -b 100M t.bin t.bin_chunk.<br />
<br />
# Check the chunks.<br />
$ ls -l t.bin_chunk.*<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.aa<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ab<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ac<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ad<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ae<br />
-rw-r--r-- 1 drozmano drozmano 6020481 Jun 8 11:09 t.bin_chunk.af<br />
<br />
# Transfer the files:<br />
$ rsync -axv t.bin_chunks.* username@arc.ucalgary.ca:<br />
</pre><br />
<br />
Then login to ARC and join the files:<br />
<pre><br />
$ cat t.bin_chunk.* > t.bin<br />
<br />
$ ls -l <br />
-rw-r--r-- 1 drozmano drozmano 530308481 Jun 8 11:06 t.bin<br />
</pre><br />
Success.<br />
<br />
[[Category:Guides]]</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=File:Rclone.svg&diff=1349File:Rclone.svg2021-06-02T21:23:36Z<p>Tthomas: </p>
<hr />
<div>rclone</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=File:Rclone_and_OneDrive_on_arc.pdf&diff=1348File:Rclone and OneDrive on arc.pdf2021-06-02T21:18:23Z<p>Tthomas: </p>
<hr />
<div>detailed instructions to configure rclone with your UofC OneDrive allocation on arc-dtn</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=File:To_use_rclone_with_OneDrive_on_arc.pdf&diff=1347File:To use rclone with OneDrive on arc.pdf2021-06-02T21:12:10Z<p>Tthomas: </p>
<hr />
<div>detailed description of how to configure rclone with UCalgary OneDrive account, and arc-dtn</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=How_to_transfer_data&diff=1341How to transfer data2021-05-28T16:41:50Z<p>Tthomas: /* Use Globus Web Application to Share Files */</p>
<hr />
<div>{{Message Box<br />
|title=Data Transfer Nodes<br />
|message=<br />
For performance and resource reasons, file transfers should be performed on the data transfer node arc-dtn.ucalgary.ca rather than on the the ARC login node. Since the ARC DTN has the same shares as ARC, any files you transfer to the DTN will also be available on ARC.<br />
}}<br />
<br />
<br />
= Command Line File Transfer Tools =<br />
You may use the following command-line file transfer utilities on Linux, MacOS, and Windows. File transfers using these methods require your computer to be on the University of Calgary campus network or via the University of Calgary IT General VPN.<br />
<br />
If you are working on a Windows computer, you will need to install these utilities separately as they are not installed by default. Newer versions of Windows 10 (1903 and up) have '''SSH''' built-in as part of the '''openssh''' package. However, you may be better off using one of the [[#GUI File Transfer]] tools listed in the following section.<br />
<br />
== <code>scp</code>: Secure Copy ==<br />
<code>scp</code> is a secure and encrypted method of transferring files between machines via SSH. It is available on Linux and Mac computers by default and can be installed on Windows by installing the OpenSSH package.<br />
<br />
The general format for the command is:<br />
$ scp [options] source destination<br />
<br />
* The <code>source</code> and <code>destination</code> fields can be a local file / directory or a remote one.<br />
* The ''local location'' is a normal Unix path, absolute or relative and <br />
* The ''remote location'' has a format <code>username@remote.system.name:file/path</code>.<br />
* The ''remote relative file path'' is relative to the home directory of the <code>username</code> on the remote system.<br />
<br />
You may see all the available options with <code>scp</code> by viewing the [http://man7.org/linux/man-pages/man1/scp.1.html man page].<br />
<br />
=== Example Usage ===<br />
Common operations are given below. On your desktop, to:<br />
* Transfer a single file (eg. <code>data.dat</code>) to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp data.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Transfer all files ending with <code>.dat</code> to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp *.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* To transfer an entire directory to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp -r my_data_directory/ username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
<br />
== <code>rsync</code> ==<br />
<code>rsync</code> is a utility for transferring and synchronizing files efficiently. The efficiency for its file synchronization is achieved by its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. <br />
<br />
<code>rsync</code> can be used to copy files and directories locally on a system or between two computers via SSH. Unlike <code>scp</code>. Because it is designed to synchronize two locations, partial transfers can be restarted by re-running <code>rsync</code> without losing progress. Resuming a partial transfer is not possible with <code>scp</code>.<br />
<br />
The general format for the command is similar to '''scp''':<br />
$ rsync [options] source destination<br />
* The <code>source</code> and <code>destination</code> fields can be a local file / directory or a remote one.<br />
* <code>rsync</code> '''cannot''' transfer files between '''two remote''' locations. The source or the destination must be a local path.<br />
* The ''local location'' is a normal Unix path, absolute or relative and <br />
* The ''remote location'' has a format <code>username@remote.system.name:file/path</code>.<br />
* The ''remote relative file path'' is relative to the home directory of the <code>username</code> on the remote system.<br />
<br />
You may see all the available options with <code>rsync</code> by viewing the [http://man7.org/linux/man-pages/man1/rsync.1.html man page].<br />
<br />
=== Example Usage ===<br />
Common operations are given below. On your desktop, to:<br />
<br />
* Upload a single file (eg. <code>data.dat</code>) from your workstation to your ARC: <syntaxhighlight lang="bash"><br />
desktop$ rsync -v data.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Upload all files matching a wildcard (eg. ending in <code>*.dat</code>): <syntaxhighlight lang="bash"><br />
$ rsync -v *.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Upload an entire directory (eg. <code>my_data</code> to <code>~/projects/project2</code>): <syntaxhighlight lang="bash"><br />
$ rsync -axv my_data username@arc-dtn.ucalgary.ca:~projects/project2/<br />
</syntaxhighlight><br />
* Upload more than one directory: <syntaxhighlight lang="bash"><br />
desktop$ rsync -axv my_data1 my_data2 my_data3 username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Download one file (eg. <code>output.dat</code>) from ARC to the current directory on your workstation: <syntaxhighlight lang="bash"><br />
## Note the '.' at the end of the command which references the current working directory on your computer<br />
desktop$ rsync -v username@arc-dtn.ucalgary.ca:projects/project1/output.dat .<br />
</syntaxhighlight><br />
* Download one directory (eg. <code>outputs</code>) from ARC to the current directory on your workstation:<syntaxhighlight lang="bash"><br />
desktop$ rsync -axv username@arc-dtn.ucalgary.ca:projects/project1/outputs .<br />
</syntaxhighlight><br />
<br />
== <code>sftp</code>: secure file transfer protocol ==<br />
<br />
* Manual page on-line: http://man7.org/linux/man-pages/man1/sftp.1.html<br />
<br />
<br />
'''sftp''' is a file transfer program, similar to '''ftp''', <br />
which performs all operations over an encrypted '''ssh''' transport. <br />
It may also use many features of '''ssh''', such as public key authentication and compression.<br />
<br />
<br />
'''sftp''' has an interactive mode, <br />
in which sftp understands a set of commands similar to those of '''ftp'''.<br />
Commands are case insensitive.<br />
<br />
== <code>rclone</code>: rsync for cloud storage ==<br />
'''Rclone''' is a command line program to sync files and directories to and from a number of on-line storage services.<br />
<br />
* https://rclone.org/<br />
<br />
== <code>curl</code> and <code>wget</code>: downloading from the internet ==<br />
To download a file with the ability to resume a partial download:<br />
curl -c http://example.com/resource.tar.gz -O <br />
wget -c http://example.com/resource.tar.gz<br />
<br />
<br />
= Graphical File Transfer Tools =<br />
== FileZilla ==<br />
FileZilla is a free cross-platform file transfer program that can transfer files via FTP and SFTP. <br />
<br />
=== Installation ===<br />
You may obtain Filezilla from the project's official website at: https://filezilla-project.org/download.php?type=client. '''Please note''': The official installer may bundle ads and unwanted software. Be careful when clicking through.<br />
<br />
Alternatively, you may obtain Filezilla from Ninite: https://ninite.com/filezilla<br />
<br />
=== Connecting to ARC ===<br />
If working off campus, first connect to the University of Calgary General VPN. Open Filezilla and connect to <code>arc-dtn.ucalgary.ca</code> on port <code>22</code>.<br />
[[File:Filezilla.jpg|alt=Connecting to ARC using Filezilla|none|thumb|Connecting to ARC using Filezilla]] <br />
<br />
== MobaXterm (Windows) ==<br />
'''MobaXterm''' is the recommended tool for remote access and data transfer in '''Windows''' OSes.<br />
<br />
MobaXterm is a one-stop solution for most remote access work on a compute cluster or a Unix / Linux server.<br />
<br />
It provides many Unix like utilities for Windows including an '''SSH''' client and '''X11''' graphics server. It provides a graphical interface for <br />
data transfer operations.<br />
<br />
* Website: https://mobaxterm.mobatek.net/<br />
<br />
== WinSCP (Windows) ==<br />
WinSCP is a free Windows file transfer tool.<br />
<br />
https://winscp.net/eng/index.php<br />
<br />
= Cloud based Fire Transfer Services =<br />
== Globus File Transfer ==<br />
Globus File Transfer is a cloud based service for file transfer and file sharing. It uses GridFTP for high speed and reliable data transfers<br><br />
<br />
=== How to get started ===<br />
<br />
# Navigate to the web page https://www.globusid.org <br />
# Create a Globus ID and password using your google account.<br />
<br />
=== Use Globus Web Application to transfer files ===<br />
<br />
# To initiate data transfers using the Globus Web Application, navigate to https://www.globusid.org/login and log into your Globus account <br />
# On the left panel, click on File Manager to define/select the endpoints. For example, to transfer data from ARC cluster (endpoint 1) to Compute Canada cedar cluster (endpoint 2) <br />
#* Under collection, for ARC data transfer node choose 'endpoint 1' as 'ucalgary#arc-dtn.ucalgary.ca' from the drop down menu. Authenticate your access using UCalgary IT credentials. This will bring you to the home directory on ARC. <br />
#* Next, for Compute Canada cedar data transfer node choose 'endpoint 2' as 'computecanada#cedar-dtn' from the drop down menu. Again authenticate your access using Compute Canada credentials. Navigate to the location where you want to transfer the file. <br />
#* Select the file to be transferred from 'endpoint 1' and initiate the transfer process. <br />
<br />
=== Use Globus Web Application to Share Files ===<br />
You may grant access to a folder in your allocation on ARC to be used for uploading or sharing files with any individual with a globus account, either through ComputeCanada or their own institution. <br />
<br />
Please see https://docs.globus.org/how-to/share-files/<br />
<br />
Here is a screen capture with an example using arc-dtn.ucalgary.ca <br />
[[File:DmitriFolder3.png|none|thumb|645x645px|The user tthomas created a folder on ARC:/home/tthomas/dmitriFolder using the Globus Web application and shared it with individuals at external organizations.]]<br />
<br />
<br />
<br />
tannistha.nandi@gmail has read and write access, ipercel@computecanada has read only access.<br />
<br />
The external identity rozmanov@globusid.org has been granted the ability to assist with adding/administering access.<br />
[[File:DmitriFolder2.png|none|thumb|577x577px|please note: the owner of an ARC allocation is accountable for any activities of the individuals with access granted via GLOBUS. ]]<br />
<br />
<br />
rozmanov@globus.org may add research collaborators to this allocation on ARC in order to share data.<br />
<br />
= Special cases =<br />
<br />
== Transferring Large Datasets ==<br />
=== Using screen and rsync ===<br />
If you want to transfer a large amount of data from a remote Unix system to ARC you can use ''<code>rsync</code>'' to handle the transfer. However, you will have to keep your SSH session from your workstation connected during the entire transfer which is often not convenient or not feasible. <br />
<br />
To overcome this one can run the '''rsync''' transfer inside a '''screen''' virtual session on ARC. '''screen''' creates an SSH session local to ARC and allows for reconnection from SSH sessions from your workstation.<br />
<br />
To begin, login to ARC and start <code>screen</code> with the screen command:<br />
<pre><br />
# Login to ARC<br />
$ ssh username@arc.ucalgary.ca<br />
<br />
# Start a screen session<br />
$ screen<br />
<br />
# While in the new screen session, start the transfer with rsync.<br />
$ rsync -axv ext_user@external.system:path/to/remote/data .<br />
<br />
# Disconnect from the screen session with the hotkey 'Ctrl-a d'<br />
# You may now disconnect from ARC or close the lid of you laptop or turn off the computer.<br />
</pre><br />
<br />
To check if the transfer has been finished.<br />
<pre><br />
# Login to ARC<br />
$ ssh username@arc.ucalgary.ca<br />
<br />
# Reconnect to the screen session<br />
$ screen -r<br />
<br />
# If the transfer has been finished close the screen session.<br />
$ exit<br />
<br />
# If the transfer is still running, disconnect from the screen session with the hotkey 'Ctrl-a d'<br />
</pre><br />
<br />
=== Very large files ===<br />
If the files are large and the transfer speed is low the transfer may fail before the file has been transferred. <br />
'''rsync''' may not be of help here, as it will not restart the file transfer (have not tested recently).<br />
<br />
The solution may be to split the large file into smaller chunks, transfer them using rsync and then join them on the remote system (ARC for example):<br />
<br />
<pre><br />
# Large file is 506MB in this example.<br />
$ ls -l t.bin<br />
-rw-r--r-- 1 drozmano drozmano 530308481 Jun 8 11:06 t.bin<br />
<br />
# split the file:<br />
$ split -b 100M t.bin t.bin_chunk.<br />
<br />
# Check the chunks.<br />
$ ls -l t.bin_chunk.*<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.aa<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ab<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ac<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ad<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ae<br />
-rw-r--r-- 1 drozmano drozmano 6020481 Jun 8 11:09 t.bin_chunk.af<br />
<br />
# Transfer the files:<br />
$ rsync -axv t.bin_chunks.* username@arc.ucalgary.ca:<br />
</pre><br />
<br />
Then login to ARC and join the files:<br />
<pre><br />
$ cat t.bin_chunk.* > t.bin<br />
<br />
$ ls -l <br />
-rw-r--r-- 1 drozmano drozmano 530308481 Jun 8 11:06 t.bin<br />
</pre><br />
Success.<br />
<br />
[[Category:Guides]]</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=How_to_transfer_data&diff=1340How to transfer data2021-05-28T16:41:26Z<p>Tthomas: /* Use Globus Web Application to Share Files */</p>
<hr />
<div>{{Message Box<br />
|title=Data Transfer Nodes<br />
|message=<br />
For performance and resource reasons, file transfers should be performed on the data transfer node arc-dtn.ucalgary.ca rather than on the the ARC login node. Since the ARC DTN has the same shares as ARC, any files you transfer to the DTN will also be available on ARC.<br />
}}<br />
<br />
<br />
= Command Line File Transfer Tools =<br />
You may use the following command-line file transfer utilities on Linux, MacOS, and Windows. File transfers using these methods require your computer to be on the University of Calgary campus network or via the University of Calgary IT General VPN.<br />
<br />
If you are working on a Windows computer, you will need to install these utilities separately as they are not installed by default. Newer versions of Windows 10 (1903 and up) have '''SSH''' built-in as part of the '''openssh''' package. However, you may be better off using one of the [[#GUI File Transfer]] tools listed in the following section.<br />
<br />
== <code>scp</code>: Secure Copy ==<br />
<code>scp</code> is a secure and encrypted method of transferring files between machines via SSH. It is available on Linux and Mac computers by default and can be installed on Windows by installing the OpenSSH package.<br />
<br />
The general format for the command is:<br />
$ scp [options] source destination<br />
<br />
* The <code>source</code> and <code>destination</code> fields can be a local file / directory or a remote one.<br />
* The ''local location'' is a normal Unix path, absolute or relative and <br />
* The ''remote location'' has a format <code>username@remote.system.name:file/path</code>.<br />
* The ''remote relative file path'' is relative to the home directory of the <code>username</code> on the remote system.<br />
<br />
You may see all the available options with <code>scp</code> by viewing the [http://man7.org/linux/man-pages/man1/scp.1.html man page].<br />
<br />
=== Example Usage ===<br />
Common operations are given below. On your desktop, to:<br />
* Transfer a single file (eg. <code>data.dat</code>) to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp data.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Transfer all files ending with <code>.dat</code> to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp *.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* To transfer an entire directory to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp -r my_data_directory/ username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
<br />
== <code>rsync</code> ==<br />
<code>rsync</code> is a utility for transferring and synchronizing files efficiently. The efficiency for its file synchronization is achieved by its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. <br />
<br />
<code>rsync</code> can be used to copy files and directories locally on a system or between two computers via SSH. Unlike <code>scp</code>. Because it is designed to synchronize two locations, partial transfers can be restarted by re-running <code>rsync</code> without losing progress. Resuming a partial transfer is not possible with <code>scp</code>.<br />
<br />
The general format for the command is similar to '''scp''':<br />
$ rsync [options] source destination<br />
* The <code>source</code> and <code>destination</code> fields can be a local file / directory or a remote one.<br />
* <code>rsync</code> '''cannot''' transfer files between '''two remote''' locations. The source or the destination must be a local path.<br />
* The ''local location'' is a normal Unix path, absolute or relative and <br />
* The ''remote location'' has a format <code>username@remote.system.name:file/path</code>.<br />
* The ''remote relative file path'' is relative to the home directory of the <code>username</code> on the remote system.<br />
<br />
You may see all the available options with <code>rsync</code> by viewing the [http://man7.org/linux/man-pages/man1/rsync.1.html man page].<br />
<br />
=== Example Usage ===<br />
Common operations are given below. On your desktop, to:<br />
<br />
* Upload a single file (eg. <code>data.dat</code>) from your workstation to your ARC: <syntaxhighlight lang="bash"><br />
desktop$ rsync -v data.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Upload all files matching a wildcard (eg. ending in <code>*.dat</code>): <syntaxhighlight lang="bash"><br />
$ rsync -v *.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Upload an entire directory (eg. <code>my_data</code> to <code>~/projects/project2</code>): <syntaxhighlight lang="bash"><br />
$ rsync -axv my_data username@arc-dtn.ucalgary.ca:~projects/project2/<br />
</syntaxhighlight><br />
* Upload more than one directory: <syntaxhighlight lang="bash"><br />
desktop$ rsync -axv my_data1 my_data2 my_data3 username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Download one file (eg. <code>output.dat</code>) from ARC to the current directory on your workstation: <syntaxhighlight lang="bash"><br />
## Note the '.' at the end of the command which references the current working directory on your computer<br />
desktop$ rsync -v username@arc-dtn.ucalgary.ca:projects/project1/output.dat .<br />
</syntaxhighlight><br />
* Download one directory (eg. <code>outputs</code>) from ARC to the current directory on your workstation:<syntaxhighlight lang="bash"><br />
desktop$ rsync -axv username@arc-dtn.ucalgary.ca:projects/project1/outputs .<br />
</syntaxhighlight><br />
<br />
== <code>sftp</code>: secure file transfer protocol ==<br />
<br />
* Manual page on-line: http://man7.org/linux/man-pages/man1/sftp.1.html<br />
<br />
<br />
'''sftp''' is a file transfer program, similar to '''ftp''', <br />
which performs all operations over an encrypted '''ssh''' transport. <br />
It may also use many features of '''ssh''', such as public key authentication and compression.<br />
<br />
<br />
'''sftp''' has an interactive mode, <br />
in which sftp understands a set of commands similar to those of '''ftp'''.<br />
Commands are case insensitive.<br />
<br />
== <code>rclone</code>: rsync for cloud storage ==<br />
'''Rclone''' is a command line program to sync files and directories to and from a number of on-line storage services.<br />
<br />
* https://rclone.org/<br />
<br />
== <code>curl</code> and <code>wget</code>: downloading from the internet ==<br />
To download a file with the ability to resume a partial download:<br />
curl -c http://example.com/resource.tar.gz -O <br />
wget -c http://example.com/resource.tar.gz<br />
<br />
<br />
= Graphical File Transfer Tools =<br />
== FileZilla ==<br />
FileZilla is a free cross-platform file transfer program that can transfer files via FTP and SFTP. <br />
<br />
=== Installation ===<br />
You may obtain Filezilla from the project's official website at: https://filezilla-project.org/download.php?type=client. '''Please note''': The official installer may bundle ads and unwanted software. Be careful when clicking through.<br />
<br />
Alternatively, you may obtain Filezilla from Ninite: https://ninite.com/filezilla<br />
<br />
=== Connecting to ARC ===<br />
If working off campus, first connect to the University of Calgary General VPN. Open Filezilla and connect to <code>arc-dtn.ucalgary.ca</code> on port <code>22</code>.<br />
[[File:Filezilla.jpg|alt=Connecting to ARC using Filezilla|none|thumb|Connecting to ARC using Filezilla]] <br />
<br />
== MobaXterm (Windows) ==<br />
'''MobaXterm''' is the recommended tool for remote access and data transfer in '''Windows''' OSes.<br />
<br />
MobaXterm is a one-stop solution for most remote access work on a compute cluster or a Unix / Linux server.<br />
<br />
It provides many Unix like utilities for Windows including an '''SSH''' client and '''X11''' graphics server. It provides a graphical interface for <br />
data transfer operations.<br />
<br />
* Website: https://mobaxterm.mobatek.net/<br />
<br />
== WinSCP (Windows) ==<br />
WinSCP is a free Windows file transfer tool.<br />
<br />
https://winscp.net/eng/index.php<br />
<br />
= Cloud based Fire Transfer Services =<br />
== Globus File Transfer ==<br />
Globus File Transfer is a cloud based service for file transfer and file sharing. It uses GridFTP for high speed and reliable data transfers<br><br />
<br />
=== How to get started ===<br />
<br />
# Navigate to the web page https://www.globusid.org <br />
# Create a Globus ID and password using your google account.<br />
<br />
=== Use Globus Web Application to transfer files ===<br />
<br />
# To initiate data transfers using the Globus Web Application, navigate to https://www.globusid.org/login and log into your Globus account <br />
# On the left panel, click on File Manager to define/select the endpoints. For example, to transfer data from ARC cluster (endpoint 1) to Compute Canada cedar cluster (endpoint 2) <br />
#* Under collection, for ARC data transfer node choose 'endpoint 1' as 'ucalgary#arc-dtn.ucalgary.ca' from the drop down menu. Authenticate your access using UCalgary IT credentials. This will bring you to the home directory on ARC. <br />
#* Next, for Compute Canada cedar data transfer node choose 'endpoint 2' as 'computecanada#cedar-dtn' from the drop down menu. Again authenticate your access using Compute Canada credentials. Navigate to the location where you want to transfer the file. <br />
#* Select the file to be transferred from 'endpoint 1' and initiate the transfer process. <br />
<br />
=== Use Globus Web Application to Share Files ===<br />
You may grant access to a folder in your allocation on ARC to be used for uploading or sharing files with any individual with a globus account, either through ComputeCanada or their own institution. <br />
<br />
Please see https://docs.globus.org/how-to/share-files/<br />
<br />
Here is a screen capture with an example using arc-dtn.ucalgary.ca <br />
[[File:DmitriFolder3.png|none|thumb|645x645px|The user tthomas created a folder on ARC:/home/tthomas/dmitriFolder using the Globus Web application and shared it with individuals at external organizations.]]<br />
<br />
<br />
<br />
tannistha.nandi@gmail.com has read and write access, ipercel@computecanada.ca has read only access.<br />
<br />
The external identity rozmanov@globusid.org has been granted the ability to assist with adding/administering access.<br />
[[File:DmitriFolder2.png|none|thumb|577x577px|please note: the owner of an ARC allocation is accountable for any activities of the individuals with access granted via GLOBUS. ]]<br />
<br />
<br />
rozmanov@globus.org may add research collaborators to this allocation on ARC in order to share data.<br />
<br />
= Special cases =<br />
<br />
== Transferring Large Datasets ==<br />
=== Using screen and rsync ===<br />
If you want to transfer a large amount of data from a remote Unix system to ARC you can use ''<code>rsync</code>'' to handle the transfer. However, you will have to keep your SSH session from your workstation connected during the entire transfer which is often not convenient or not feasible. <br />
<br />
To overcome this one can run the '''rsync''' transfer inside a '''screen''' virtual session on ARC. '''screen''' creates an SSH session local to ARC and allows for reconnection from SSH sessions from your workstation.<br />
<br />
To begin, login to ARC and start <code>screen</code> with the screen command:<br />
<pre><br />
# Login to ARC<br />
$ ssh username@arc.ucalgary.ca<br />
<br />
# Start a screen session<br />
$ screen<br />
<br />
# While in the new screen session, start the transfer with rsync.<br />
$ rsync -axv ext_user@external.system:path/to/remote/data .<br />
<br />
# Disconnect from the screen session with the hotkey 'Ctrl-a d'<br />
# You may now disconnect from ARC or close the lid of you laptop or turn off the computer.<br />
</pre><br />
<br />
To check if the transfer has been finished.<br />
<pre><br />
# Login to ARC<br />
$ ssh username@arc.ucalgary.ca<br />
<br />
# Reconnect to the screen session<br />
$ screen -r<br />
<br />
# If the transfer has been finished close the screen session.<br />
$ exit<br />
<br />
# If the transfer is still running, disconnect from the screen session with the hotkey 'Ctrl-a d'<br />
</pre><br />
<br />
=== Very large files ===<br />
If the files are large and the transfer speed is low the transfer may fail before the file has been transferred. <br />
'''rsync''' may not be of help here, as it will not restart the file transfer (have not tested recently).<br />
<br />
The solution may be to split the large file into smaller chunks, transfer them using rsync and then join them on the remote system (ARC for example):<br />
<br />
<pre><br />
# Large file is 506MB in this example.<br />
$ ls -l t.bin<br />
-rw-r--r-- 1 drozmano drozmano 530308481 Jun 8 11:06 t.bin<br />
<br />
# split the file:<br />
$ split -b 100M t.bin t.bin_chunk.<br />
<br />
# Check the chunks.<br />
$ ls -l t.bin_chunk.*<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.aa<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ab<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ac<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ad<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ae<br />
-rw-r--r-- 1 drozmano drozmano 6020481 Jun 8 11:09 t.bin_chunk.af<br />
<br />
# Transfer the files:<br />
$ rsync -axv t.bin_chunks.* username@arc.ucalgary.ca:<br />
</pre><br />
<br />
Then login to ARC and join the files:<br />
<pre><br />
$ cat t.bin_chunk.* > t.bin<br />
<br />
$ ls -l <br />
-rw-r--r-- 1 drozmano drozmano 530308481 Jun 8 11:06 t.bin<br />
</pre><br />
Success.<br />
<br />
[[Category:Guides]]</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=How_to_transfer_data&diff=1339How to transfer data2021-05-28T16:39:52Z<p>Tthomas: /* Use Globus Web Application to Share Files */</p>
<hr />
<div>{{Message Box<br />
|title=Data Transfer Nodes<br />
|message=<br />
For performance and resource reasons, file transfers should be performed on the data transfer node arc-dtn.ucalgary.ca rather than on the the ARC login node. Since the ARC DTN has the same shares as ARC, any files you transfer to the DTN will also be available on ARC.<br />
}}<br />
<br />
<br />
= Command Line File Transfer Tools =<br />
You may use the following command-line file transfer utilities on Linux, MacOS, and Windows. File transfers using these methods require your computer to be on the University of Calgary campus network or via the University of Calgary IT General VPN.<br />
<br />
If you are working on a Windows computer, you will need to install these utilities separately as they are not installed by default. Newer versions of Windows 10 (1903 and up) have '''SSH''' built-in as part of the '''openssh''' package. However, you may be better off using one of the [[#GUI File Transfer]] tools listed in the following section.<br />
<br />
== <code>scp</code>: Secure Copy ==<br />
<code>scp</code> is a secure and encrypted method of transferring files between machines via SSH. It is available on Linux and Mac computers by default and can be installed on Windows by installing the OpenSSH package.<br />
<br />
The general format for the command is:<br />
$ scp [options] source destination<br />
<br />
* The <code>source</code> and <code>destination</code> fields can be a local file / directory or a remote one.<br />
* The ''local location'' is a normal Unix path, absolute or relative and <br />
* The ''remote location'' has a format <code>username@remote.system.name:file/path</code>.<br />
* The ''remote relative file path'' is relative to the home directory of the <code>username</code> on the remote system.<br />
<br />
You may see all the available options with <code>scp</code> by viewing the [http://man7.org/linux/man-pages/man1/scp.1.html man page].<br />
<br />
=== Example Usage ===<br />
Common operations are given below. On your desktop, to:<br />
* Transfer a single file (eg. <code>data.dat</code>) to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp data.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Transfer all files ending with <code>.dat</code> to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp *.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* To transfer an entire directory to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp -r my_data_directory/ username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
<br />
== <code>rsync</code> ==<br />
<code>rsync</code> is a utility for transferring and synchronizing files efficiently. The efficiency for its file synchronization is achieved by its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. <br />
<br />
<code>rsync</code> can be used to copy files and directories locally on a system or between two computers via SSH. Unlike <code>scp</code>. Because it is designed to synchronize two locations, partial transfers can be restarted by re-running <code>rsync</code> without losing progress. Resuming a partial transfer is not possible with <code>scp</code>.<br />
<br />
The general format for the command is similar to '''scp''':<br />
$ rsync [options] source destination<br />
* The <code>source</code> and <code>destination</code> fields can be a local file / directory or a remote one.<br />
* <code>rsync</code> '''cannot''' transfer files between '''two remote''' locations. The source or the destination must be a local path.<br />
* The ''local location'' is a normal Unix path, absolute or relative and <br />
* The ''remote location'' has a format <code>username@remote.system.name:file/path</code>.<br />
* The ''remote relative file path'' is relative to the home directory of the <code>username</code> on the remote system.<br />
<br />
You may see all the available options with <code>rsync</code> by viewing the [http://man7.org/linux/man-pages/man1/rsync.1.html man page].<br />
<br />
=== Example Usage ===<br />
Common operations are given below. On your desktop, to:<br />
<br />
* Upload a single file (eg. <code>data.dat</code>) from your workstation to your ARC: <syntaxhighlight lang="bash"><br />
desktop$ rsync -v data.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Upload all files matching a wildcard (eg. ending in <code>*.dat</code>): <syntaxhighlight lang="bash"><br />
$ rsync -v *.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Upload an entire directory (eg. <code>my_data</code> to <code>~/projects/project2</code>): <syntaxhighlight lang="bash"><br />
$ rsync -axv my_data username@arc-dtn.ucalgary.ca:~projects/project2/<br />
</syntaxhighlight><br />
* Upload more than one directory: <syntaxhighlight lang="bash"><br />
desktop$ rsync -axv my_data1 my_data2 my_data3 username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Download one file (eg. <code>output.dat</code>) from ARC to the current directory on your workstation: <syntaxhighlight lang="bash"><br />
## Note the '.' at the end of the command which references the current working directory on your computer<br />
desktop$ rsync -v username@arc-dtn.ucalgary.ca:projects/project1/output.dat .<br />
</syntaxhighlight><br />
* Download one directory (eg. <code>outputs</code>) from ARC to the current directory on your workstation:<syntaxhighlight lang="bash"><br />
desktop$ rsync -axv username@arc-dtn.ucalgary.ca:projects/project1/outputs .<br />
</syntaxhighlight><br />
<br />
== <code>sftp</code>: secure file transfer protocol ==<br />
<br />
* Manual page on-line: http://man7.org/linux/man-pages/man1/sftp.1.html<br />
<br />
<br />
'''sftp''' is a file transfer program, similar to '''ftp''', <br />
which performs all operations over an encrypted '''ssh''' transport. <br />
It may also use many features of '''ssh''', such as public key authentication and compression.<br />
<br />
<br />
'''sftp''' has an interactive mode, <br />
in which sftp understands a set of commands similar to those of '''ftp'''.<br />
Commands are case insensitive.<br />
<br />
== <code>rclone</code>: rsync for cloud storage ==<br />
'''Rclone''' is a command line program to sync files and directories to and from a number of on-line storage services.<br />
<br />
* https://rclone.org/<br />
<br />
== <code>curl</code> and <code>wget</code>: downloading from the internet ==<br />
To download a file with the ability to resume a partial download:<br />
curl -c http://example.com/resource.tar.gz -O <br />
wget -c http://example.com/resource.tar.gz<br />
<br />
<br />
= Graphical File Transfer Tools =<br />
== FileZilla ==<br />
FileZilla is a free cross-platform file transfer program that can transfer files via FTP and SFTP. <br />
<br />
=== Installation ===<br />
You may obtain Filezilla from the project's official website at: https://filezilla-project.org/download.php?type=client. '''Please note''': The official installer may bundle ads and unwanted software. Be careful when clicking through.<br />
<br />
Alternatively, you may obtain Filezilla from Ninite: https://ninite.com/filezilla<br />
<br />
=== Connecting to ARC ===<br />
If working off campus, first connect to the University of Calgary General VPN. Open Filezilla and connect to <code>arc-dtn.ucalgary.ca</code> on port <code>22</code>.<br />
[[File:Filezilla.jpg|alt=Connecting to ARC using Filezilla|none|thumb|Connecting to ARC using Filezilla]] <br />
<br />
== MobaXterm (Windows) ==<br />
'''MobaXterm''' is the recommended tool for remote access and data transfer in '''Windows''' OSes.<br />
<br />
MobaXterm is a one-stop solution for most remote access work on a compute cluster or a Unix / Linux server.<br />
<br />
It provides many Unix like utilities for Windows including an '''SSH''' client and '''X11''' graphics server. It provides a graphical interface for <br />
data transfer operations.<br />
<br />
* Website: https://mobaxterm.mobatek.net/<br />
<br />
== WinSCP (Windows) ==<br />
WinSCP is a free Windows file transfer tool.<br />
<br />
https://winscp.net/eng/index.php<br />
<br />
= Cloud based Fire Transfer Services =<br />
== Globus File Transfer ==<br />
Globus File Transfer is a cloud based service for file transfer and file sharing. It uses GridFTP for high speed and reliable data transfers<br><br />
<br />
=== How to get started ===<br />
<br />
# Navigate to the web page https://www.globusid.org <br />
# Create a Globus ID and password using your google account.<br />
<br />
=== Use Globus Web Application to transfer files ===<br />
<br />
# To initiate data transfers using the Globus Web Application, navigate to https://www.globusid.org/login and log into your Globus account <br />
# On the left panel, click on File Manager to define/select the endpoints. For example, to transfer data from ARC cluster (endpoint 1) to Compute Canada cedar cluster (endpoint 2) <br />
#* Under collection, for ARC data transfer node choose 'endpoint 1' as 'ucalgary#arc-dtn.ucalgary.ca' from the drop down menu. Authenticate your access using UCalgary IT credentials. This will bring you to the home directory on ARC. <br />
#* Next, for Compute Canada cedar data transfer node choose 'endpoint 2' as 'computecanada#cedar-dtn' from the drop down menu. Again authenticate your access using Compute Canada credentials. Navigate to the location where you want to transfer the file. <br />
#* Select the file to be transferred from 'endpoint 1' and initiate the transfer process. <br />
<br />
=== Use Globus Web Application to Share Files ===<br />
You may grant access to a folder in your allocation on ARC to be used for uploading or sharing files with any individual with a globus account, either through ComputeCanada or their own institution. <br />
<br />
Please see https://docs.globus.org/how-to/share-files/<br />
<br />
Here is a screen capture with an example using arc-dtn.ucalgary.ca <br />
[[File:DmitriFolder3.png|none|thumb|645x645px|The user tthomas created a folder on ARC:/home/tthomas/dmitriFolder using the Globus Web application and shared it with individuals at external organizations.]]<br />
<br />
<br />
nandit@computecanada.ca has read and write access, ipercel@computecanada.ca has read only access.<br />
<br />
The external identity rozmanov@globusid.org has been granted the ability to assist with adding/administering access.<br />
[[File:DmitriFolder2.png|none|thumb|577x577px|please note: the owner of an ARC allocation is accountable for any activities of the individuals with access granted via GLOBUS. ]]<br />
<br />
<br />
rozmanov@globus.org may add research collaborators to this allocation on ARC in order to share data.<br />
<br />
= Special cases =<br />
<br />
== Transferring Large Datasets ==<br />
=== Using screen and rsync ===<br />
If you want to transfer a large amount of data from a remote Unix system to ARC you can use ''<code>rsync</code>'' to handle the transfer. However, you will have to keep your SSH session from your workstation connected during the entire transfer which is often not convenient or not feasible. <br />
<br />
To overcome this one can run the '''rsync''' transfer inside a '''screen''' virtual session on ARC. '''screen''' creates an SSH session local to ARC and allows for reconnection from SSH sessions from your workstation.<br />
<br />
To begin, login to ARC and start <code>screen</code> with the screen command:<br />
<pre><br />
# Login to ARC<br />
$ ssh username@arc.ucalgary.ca<br />
<br />
# Start a screen session<br />
$ screen<br />
<br />
# While in the new screen session, start the transfer with rsync.<br />
$ rsync -axv ext_user@external.system:path/to/remote/data .<br />
<br />
# Disconnect from the screen session with the hotkey 'Ctrl-a d'<br />
# You may now disconnect from ARC or close the lid of you laptop or turn off the computer.<br />
</pre><br />
<br />
To check if the transfer has been finished.<br />
<pre><br />
# Login to ARC<br />
$ ssh username@arc.ucalgary.ca<br />
<br />
# Reconnect to the screen session<br />
$ screen -r<br />
<br />
# If the transfer has been finished close the screen session.<br />
$ exit<br />
<br />
# If the transfer is still running, disconnect from the screen session with the hotkey 'Ctrl-a d'<br />
</pre><br />
<br />
=== Very large files ===<br />
If the files are large and the transfer speed is low the transfer may fail before the file has been transferred. <br />
'''rsync''' may not be of help here, as it will not restart the file transfer (have not tested recently).<br />
<br />
The solution may be to split the large file into smaller chunks, transfer them using rsync and then join them on the remote system (ARC for example):<br />
<br />
<pre><br />
# Large file is 506MB in this example.<br />
$ ls -l t.bin<br />
-rw-r--r-- 1 drozmano drozmano 530308481 Jun 8 11:06 t.bin<br />
<br />
# split the file:<br />
$ split -b 100M t.bin t.bin_chunk.<br />
<br />
# Check the chunks.<br />
$ ls -l t.bin_chunk.*<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.aa<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ab<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ac<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ad<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ae<br />
-rw-r--r-- 1 drozmano drozmano 6020481 Jun 8 11:09 t.bin_chunk.af<br />
<br />
# Transfer the files:<br />
$ rsync -axv t.bin_chunks.* username@arc.ucalgary.ca:<br />
</pre><br />
<br />
Then login to ARC and join the files:<br />
<pre><br />
$ cat t.bin_chunk.* > t.bin<br />
<br />
$ ls -l <br />
-rw-r--r-- 1 drozmano drozmano 530308481 Jun 8 11:06 t.bin<br />
</pre><br />
Success.<br />
<br />
[[Category:Guides]]</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=File:DmitriFolder3.png&diff=1338File:DmitriFolder3.png2021-05-28T16:38:41Z<p>Tthomas: </p>
<hr />
<div>The user tthomas created a folder on ARC:/home/tthomas/dmitriFolder using the Globus Web application and shared it with individuals at external organizations.</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=How_to_transfer_data&diff=1330How to transfer data2021-05-26T12:44:13Z<p>Tthomas: /* Globus File Transfer */</p>
<hr />
<div>{{Message Box<br />
|title=Data Transfer Nodes<br />
|message=<br />
For performance and resource reasons, file transfers should be performed on the data transfer node arc-dtn.ucalgary.ca rather than on the the ARC login node. Since the ARC DTN has the same shares as ARC, any files you transfer to the DTN will also be available on ARC.<br />
}}<br />
<br />
<br />
= Command Line File Transfer Tools =<br />
You may use the following command-line file transfer utilities on Linux, MacOS, and Windows. File transfers using these methods require your computer to be on the University of Calgary campus network or via the University of Calgary IT General VPN.<br />
<br />
If you are working on a Windows computer, you will need to install these utilities separately as they are not installed by default. Newer versions of Windows 10 (1903 and up) have '''SSH''' built-in as part of the '''openssh''' package. However, you may be better off using one of the [[#GUI File Transfer]] tools listed in the following section.<br />
<br />
== <code>scp</code>: Secure Copy ==<br />
<code>scp</code> is a secure and encrypted method of transferring files between machines via SSH. It is available on Linux and Mac computers by default and can be installed on Windows by installing the OpenSSH package.<br />
<br />
The general format for the command is:<br />
$ scp [options] source destination<br />
<br />
* The <code>source</code> and <code>destination</code> fields can be a local file / directory or a remote one.<br />
* The ''local location'' is a normal Unix path, absolute or relative and <br />
* The ''remote location'' has a format <code>username@remote.system.name:file/path</code>.<br />
* The ''remote relative file path'' is relative to the home directory of the <code>username</code> on the remote system.<br />
<br />
You may see all the available options with <code>scp</code> by viewing the [http://man7.org/linux/man-pages/man1/scp.1.html man page].<br />
<br />
=== Example Usage ===<br />
Common operations are given below. On your desktop, to:<br />
* Transfer a single file (eg. <code>data.dat</code>) to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp data.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Transfer all files ending with <code>.dat</code> to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp *.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* To transfer an entire directory to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp -r my_data_directory/ username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
<br />
== <code>rsync</code> ==<br />
<code>rsync</code> is a utility for transferring and synchronizing files efficiently. The efficiency for its file synchronization is achieved by its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. <br />
<br />
<code>rsync</code> can be used to copy files and directories locally on a system or between two computers via SSH. Unlike <code>scp</code>. Because it is designed to synchronize two locations, partial transfers can be restarted by re-running <code>rsync</code> without losing progress. Resuming a partial transfer is not possible with <code>scp</code>.<br />
<br />
The general format for the command is similar to '''scp''':<br />
$ rsync [options] source destination<br />
* The <code>source</code> and <code>destination</code> fields can be a local file / directory or a remote one.<br />
* <code>rsync</code> '''cannot''' transfer files between '''two remote''' locations. The source or the destination must be a local path.<br />
* The ''local location'' is a normal Unix path, absolute or relative and <br />
* The ''remote location'' has a format <code>username@remote.system.name:file/path</code>.<br />
* The ''remote relative file path'' is relative to the home directory of the <code>username</code> on the remote system.<br />
<br />
You may see all the available options with <code>rsync</code> by viewing the [http://man7.org/linux/man-pages/man1/rsync.1.html man page].<br />
<br />
=== Example Usage ===<br />
Common operations are given below. On your desktop, to:<br />
<br />
* Upload a single file (eg. <code>data.dat</code>) from your workstation to your ARC: <syntaxhighlight lang="bash"><br />
desktop$ rsync -v data.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Upload all files matching a wildcard (eg. ending in <code>*.dat</code>): <syntaxhighlight lang="bash"><br />
$ rsync -v *.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Upload an entire directory (eg. <code>my_data</code> to <code>~/projects/project2</code>): <syntaxhighlight lang="bash"><br />
$ rsync -axv my_data username@arc-dtn.ucalgary.ca:~projects/project2/<br />
</syntaxhighlight><br />
* Upload more than one directory: <syntaxhighlight lang="bash"><br />
desktop$ rsync -axv my_data1 my_data2 my_data3 username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Download one file (eg. <code>output.dat</code>) from ARC to the current directory on your workstation: <syntaxhighlight lang="bash"><br />
## Note the '.' at the end of the command which references the current working directory on your computer<br />
desktop$ rsync -v username@arc-dtn.ucalgary.ca:projects/project1/output.dat .<br />
</syntaxhighlight><br />
* Download one directory (eg. <code>outputs</code>) from ARC to the current directory on your workstation:<syntaxhighlight lang="bash"><br />
desktop$ rsync -axv username@arc-dtn.ucalgary.ca:projects/project1/outputs .<br />
</syntaxhighlight><br />
<br />
== <code>sftp</code>: secure file transfer protocol ==<br />
<br />
* Manual page on-line: http://man7.org/linux/man-pages/man1/sftp.1.html<br />
<br />
<br />
'''sftp''' is a file transfer program, similar to '''ftp''', <br />
which performs all operations over an encrypted '''ssh''' transport. <br />
It may also use many features of '''ssh''', such as public key authentication and compression.<br />
<br />
<br />
'''sftp''' has an interactive mode, <br />
in which sftp understands a set of commands similar to those of '''ftp'''.<br />
Commands are case insensitive.<br />
<br />
== <code>rclone</code>: rsync for cloud storage ==<br />
'''Rclone''' is a command line program to sync files and directories to and from a number of on-line storage services.<br />
<br />
* https://rclone.org/<br />
<br />
== <code>curl</code> and <code>wget</code>: downloading from the internet ==<br />
To download a file with the ability to resume a partial download:<br />
curl -c http://example.com/resource.tar.gz -O <br />
wget -c http://example.com/resource.tar.gz<br />
<br />
<br />
= Graphical File Transfer Tools =<br />
== FileZilla ==<br />
FileZilla is a free cross-platform file transfer program that can transfer files via FTP and SFTP. <br />
<br />
=== Installation ===<br />
You may obtain Filezilla from the project's official website at: https://filezilla-project.org/download.php?type=client. '''Please note''': The official installer may bundle ads and unwanted software. Be careful when clicking through.<br />
<br />
Alternatively, you may obtain Filezilla from Ninite: https://ninite.com/filezilla<br />
<br />
=== Connecting to ARC ===<br />
If working off campus, first connect to the University of Calgary General VPN. Open Filezilla and connect to <code>arc-dtn.ucalgary.ca</code> on port <code>22</code>.<br />
[[File:Filezilla.jpg|alt=Connecting to ARC using Filezilla|none|thumb|Connecting to ARC using Filezilla]] <br />
<br />
== MobaXterm (Windows) ==<br />
'''MobaXterm''' is the recommended tool for remote access and data transfer in '''Windows''' OSes.<br />
<br />
MobaXterm is a one-stop solution for most remote access work on a compute cluster or a Unix / Linux server.<br />
<br />
It provides many Unix like utilities for Windows including an '''SSH''' client and '''X11''' graphics server. It provides a graphical interface for <br />
data transfer operations.<br />
<br />
* Website: https://mobaxterm.mobatek.net/<br />
<br />
== WinSCP (Windows) ==<br />
WinSCP is a free Windows file transfer tool.<br />
<br />
https://winscp.net/eng/index.php<br />
<br />
= Cloud based Fire Transfer Services =<br />
== Globus File Transfer ==<br />
Globus File Transfer is a cloud based service for file transfer and file sharing. It uses GridFTP for high speed and reliable data transfers<br><br />
<br />
=== How to get started ===<br />
<br />
# Navigate to the web page https://www.globusid.org <br />
# Create a Globus ID and password using your google account.<br />
<br />
=== Use Globus Web Application to transfer files ===<br />
<br />
# To initiate data transfers using the Globus Web Application, navigate to https://www.globusid.org/login and log into your Globus account <br />
# On the left panel, click on File Manager to define/select the endpoints. For example, to transfer data from ARC cluster (endpoint 1) to Compute Canada cedar cluster (endpoint 2) <br />
#* Under collection, for ARC data transfer node choose 'endpoint 1' as 'ucalgary#arc-dtn.ucalgary.ca' from the drop down menu. Authenticate your access using UCalgary IT credentials. This will bring you to the home directory on ARC. <br />
#* Next, for Compute Canada cedar data transfer node choose 'endpoint 2' as 'computecanada#cedar-dtn' from the drop down menu. Again authenticate your access using Compute Canada credentials. Navigate to the location where you want to transfer the file. <br />
#* Select the file to be transferred from 'endpoint 1' and initiate the transfer process. <br />
<br />
=== Use Globus Web Application to Share Files ===<br />
You may grant access to a folder in your allocation on ARC to be used for uploading or sharing files with any individual with a globus account, either through ComputeCanada or their own institution. <br />
<br />
Please see https://docs.globus.org/how-to/share-files/<br />
<br />
Here is a screen capture with an example using arc-dtn.ucalgary.ca <br />
[[File:DmitriFolder.png|none|thumb|641x641px|The user tthomas created a folder on ARC:/home/tthomas/dmitriFolder using the Globus Web application and shared it with individuals at external organizations. ]]<br />
<br />
<br />
nandit@computecanada.ca has read and write access, ipercel@computecanada.ca has read only access.<br />
<br />
The external identity rozmanov@globusid.org has been granted the ability to assist with adding/administering access.<br />
[[File:DmitriFolder2.png|none|thumb|577x577px|please note: the owner of an ARC allocation is accountable for any activities of the individuals with access granted via GLOBUS. ]]<br />
<br />
<br />
rozmanov@globus.org may add research collaborators to this allocation on ARC in order to share data.<br />
<br />
= Special cases =<br />
<br />
== Transferring Large Datasets ==<br />
=== Using screen and rsync ===<br />
If you want to transfer a large amount of data from a remote Unix system to ARC you can use ''<code>rsync</code>'' to handle the transfer. However, you will have to keep your SSH session from your workstation connected during the entire transfer which is often not convenient or not feasible. <br />
<br />
To overcome this one can run the '''rsync''' transfer inside a '''screen''' virtual session on ARC. '''screen''' creates an SSH session local to ARC and allows for reconnection from SSH sessions from your workstation.<br />
<br />
To begin, login to ARC and start <code>screen</code> with the screen command:<br />
<pre><br />
# Login to ARC<br />
$ ssh username@arc.ucalgary.ca<br />
<br />
# Start a screen session<br />
$ screen<br />
<br />
# While in the new screen session, start the transfer with rsync.<br />
$ rsync -axv ext_user@external.system:path/to/remote/data .<br />
<br />
# Disconnect from the screen session with the hotkey 'Ctrl-a d'<br />
# You may now disconnect from ARC or close the lid of you laptop or turn off the computer.<br />
</pre><br />
<br />
To check if the transfer has been finished.<br />
<pre><br />
# Login to ARC<br />
$ ssh username@arc.ucalgary.ca<br />
<br />
# Reconnect to the screen session<br />
$ screen -r<br />
<br />
# If the transfer has been finished close the screen session.<br />
$ exit<br />
<br />
# If the transfer is still running, disconnect from the screen session with the hotkey 'Ctrl-a d'<br />
</pre><br />
<br />
=== Very large files ===<br />
If the files are large and the transfer speed is low the transfer may fail before the file has been transferred. <br />
'''rsync''' may not be of help here, as it will not restart the file transfer (have not tested recently).<br />
<br />
The solution may be to split the large file into smaller chunks, transfer them using rsync and then join them on the remote system (ARC for example):<br />
<br />
<pre><br />
# Large file is 506MB in this example.<br />
$ ls -l t.bin<br />
-rw-r--r-- 1 drozmano drozmano 530308481 Jun 8 11:06 t.bin<br />
<br />
# split the file:<br />
$ split -b 100M t.bin t.bin_chunk.<br />
<br />
# Check the chunks.<br />
$ ls -l t.bin_chunk.*<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.aa<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ab<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ac<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ad<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ae<br />
-rw-r--r-- 1 drozmano drozmano 6020481 Jun 8 11:09 t.bin_chunk.af<br />
<br />
# Transfer the files:<br />
$ rsync -axv t.bin_chunks.* username@arc.ucalgary.ca:<br />
</pre><br />
<br />
Then login to ARC and join the files:<br />
<pre><br />
$ cat t.bin_chunk.* > t.bin<br />
<br />
$ ls -l <br />
-rw-r--r-- 1 drozmano drozmano 530308481 Jun 8 11:06 t.bin<br />
</pre><br />
Success.<br />
<br />
[[Category:Guides]]</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=How_to_transfer_data&diff=1329How to transfer data2021-05-26T12:37:11Z<p>Tthomas: /* Globus File Transfer */</p>
<hr />
<div>{{Message Box<br />
|title=Data Transfer Nodes<br />
|message=<br />
For performance and resource reasons, file transfers should be performed on the data transfer node arc-dtn.ucalgary.ca rather than on the the ARC login node. Since the ARC DTN has the same shares as ARC, any files you transfer to the DTN will also be available on ARC.<br />
}}<br />
<br />
<br />
= Command Line File Transfer Tools =<br />
You may use the following command-line file transfer utilities on Linux, MacOS, and Windows. File transfers using these methods require your computer to be on the University of Calgary campus network or via the University of Calgary IT General VPN.<br />
<br />
If you are working on a Windows computer, you will need to install these utilities separately as they are not installed by default. Newer versions of Windows 10 (1903 and up) have '''SSH''' built-in as part of the '''openssh''' package. However, you may be better off using one of the [[#GUI File Transfer]] tools listed in the following section.<br />
<br />
== <code>scp</code>: Secure Copy ==<br />
<code>scp</code> is a secure and encrypted method of transferring files between machines via SSH. It is available on Linux and Mac computers by default and can be installed on Windows by installing the OpenSSH package.<br />
<br />
The general format for the command is:<br />
$ scp [options] source destination<br />
<br />
* The <code>source</code> and <code>destination</code> fields can be a local file / directory or a remote one.<br />
* The ''local location'' is a normal Unix path, absolute or relative and <br />
* The ''remote location'' has a format <code>username@remote.system.name:file/path</code>.<br />
* The ''remote relative file path'' is relative to the home directory of the <code>username</code> on the remote system.<br />
<br />
You may see all the available options with <code>scp</code> by viewing the [http://man7.org/linux/man-pages/man1/scp.1.html man page].<br />
<br />
=== Example Usage ===<br />
Common operations are given below. On your desktop, to:<br />
* Transfer a single file (eg. <code>data.dat</code>) to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp data.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Transfer all files ending with <code>.dat</code> to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp *.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* To transfer an entire directory to ARC: <syntaxhighlight lang="bash"><br />
desktop$ scp -r my_data_directory/ username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
<br />
== <code>rsync</code> ==<br />
<code>rsync</code> is a utility for transferring and synchronizing files efficiently. The efficiency for its file synchronization is achieved by its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. <br />
<br />
<code>rsync</code> can be used to copy files and directories locally on a system or between two computers via SSH. Unlike <code>scp</code>. Because it is designed to synchronize two locations, partial transfers can be restarted by re-running <code>rsync</code> without losing progress. Resuming a partial transfer is not possible with <code>scp</code>.<br />
<br />
The general format for the command is similar to '''scp''':<br />
$ rsync [options] source destination<br />
* The <code>source</code> and <code>destination</code> fields can be a local file / directory or a remote one.<br />
* <code>rsync</code> '''cannot''' transfer files between '''two remote''' locations. The source or the destination must be a local path.<br />
* The ''local location'' is a normal Unix path, absolute or relative and <br />
* The ''remote location'' has a format <code>username@remote.system.name:file/path</code>.<br />
* The ''remote relative file path'' is relative to the home directory of the <code>username</code> on the remote system.<br />
<br />
You may see all the available options with <code>rsync</code> by viewing the [http://man7.org/linux/man-pages/man1/rsync.1.html man page].<br />
<br />
=== Example Usage ===<br />
Common operations are given below. On your desktop, to:<br />
<br />
* Upload a single file (eg. <code>data.dat</code>) from your workstation to your ARC: <syntaxhighlight lang="bash"><br />
desktop$ rsync -v data.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Upload all files matching a wildcard (eg. ending in <code>*.dat</code>): <syntaxhighlight lang="bash"><br />
$ rsync -v *.dat username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Upload an entire directory (eg. <code>my_data</code> to <code>~/projects/project2</code>): <syntaxhighlight lang="bash"><br />
$ rsync -axv my_data username@arc-dtn.ucalgary.ca:~projects/project2/<br />
</syntaxhighlight><br />
* Upload more than one directory: <syntaxhighlight lang="bash"><br />
desktop$ rsync -axv my_data1 my_data2 my_data3 username@arc-dtn.ucalgary.ca:/desired/destination<br />
</syntaxhighlight><br />
* Download one file (eg. <code>output.dat</code>) from ARC to the current directory on your workstation: <syntaxhighlight lang="bash"><br />
## Note the '.' at the end of the command which references the current working directory on your computer<br />
desktop$ rsync -v username@arc-dtn.ucalgary.ca:projects/project1/output.dat .<br />
</syntaxhighlight><br />
* Download one directory (eg. <code>outputs</code>) from ARC to the current directory on your workstation:<syntaxhighlight lang="bash"><br />
desktop$ rsync -axv username@arc-dtn.ucalgary.ca:projects/project1/outputs .<br />
</syntaxhighlight><br />
<br />
== <code>sftp</code>: secure file transfer protocol ==<br />
<br />
* Manual page on-line: http://man7.org/linux/man-pages/man1/sftp.1.html<br />
<br />
<br />
'''sftp''' is a file transfer program, similar to '''ftp''', <br />
which performs all operations over an encrypted '''ssh''' transport. <br />
It may also use many features of '''ssh''', such as public key authentication and compression.<br />
<br />
<br />
'''sftp''' has an interactive mode, <br />
in which sftp understands a set of commands similar to those of '''ftp'''.<br />
Commands are case insensitive.<br />
<br />
== <code>rclone</code>: rsync for cloud storage ==<br />
'''Rclone''' is a command line program to sync files and directories to and from a number of on-line storage services.<br />
<br />
* https://rclone.org/<br />
<br />
== <code>curl</code> and <code>wget</code>: downloading from the internet ==<br />
To download a file with the ability to resume a partial download:<br />
curl -c http://example.com/resource.tar.gz -O <br />
wget -c http://example.com/resource.tar.gz<br />
<br />
<br />
= Graphical File Transfer Tools =<br />
== FileZilla ==<br />
FileZilla is a free cross-platform file transfer program that can transfer files via FTP and SFTP. <br />
<br />
=== Installation ===<br />
You may obtain Filezilla from the project's official website at: https://filezilla-project.org/download.php?type=client. '''Please note''': The official installer may bundle ads and unwanted software. Be careful when clicking through.<br />
<br />
Alternatively, you may obtain Filezilla from Ninite: https://ninite.com/filezilla<br />
<br />
=== Connecting to ARC ===<br />
If working off campus, first connect to the University of Calgary General VPN. Open Filezilla and connect to <code>arc-dtn.ucalgary.ca</code> on port <code>22</code>.<br />
[[File:Filezilla.jpg|alt=Connecting to ARC using Filezilla|none|thumb|Connecting to ARC using Filezilla]] <br />
<br />
== MobaXterm (Windows) ==<br />
'''MobaXterm''' is the recommended tool for remote access and data transfer in '''Windows''' OSes.<br />
<br />
MobaXterm is a one-stop solution for most remote access work on a compute cluster or a Unix / Linux server.<br />
<br />
It provides many Unix like utilities for Windows including an '''SSH''' client and '''X11''' graphics server. It provides a graphical interface for <br />
data transfer operations.<br />
<br />
* Website: https://mobaxterm.mobatek.net/<br />
<br />
== WinSCP (Windows) ==<br />
WinSCP is a free Windows file transfer tool.<br />
<br />
https://winscp.net/eng/index.php<br />
<br />
= Cloud based Fire Transfer Services =<br />
== Globus File Transfer ==<br />
Globus File Transfer is a cloud based service for file transfer and file sharing. It uses GridFTP for high speed and reliable data transfers<br><br />
<br />
=== How to get started ===<br />
<br />
# Navigate to the web page https://www.globusid.org <br />
# Create a Globus ID and password using your google account.<br />
<br />
=== Use Globus Web Application to transfer files ===<br />
<br />
# To initiate data transfers using the Globus Web Application, navigate to https://www.globusid.org/login and log into your Globus account <br />
# On the left panel, click on File Manager to define/select the endpoints. For example, to transfer data from ARC cluster (endpoint 1) to Compute Canada cedar cluster (endpoint 2) <br />
#* Under collection, for ARC data transfer node choose 'endpoint 1' as 'ucalgary#arc-dtn.ucalgary.ca' from the drop down menu. Authenticate your access using UCalgary IT credentials. This will bring you to the home directory on ARC. <br />
#* Next, for Compute Canada cedar data transfer node choose 'endpoint 2' as 'computecanada#cedar-dtn' from the drop down menu. Again authenticate your access using Compute Canada credentials. Navigate to the location where you want to transfer the file. <br />
#* Select the file to be transferred from 'endpoint 1' and initiate the transfer process. <br />
'''Use Globus Web Application to Share Files'''<br />
<br />
Please see https://docs.globus.org/how-to/share-files/<br />
<br />
Here is a screen capture with an example using arc-dtn.ucalgary.ca <br />
[[File:DmitriFolder.png|none|thumb|641x641px|The user tthomas created a folder on ARC:/home/tthomas/dmitriFolder using the Globus Web application and shared it with individuals at external organizations. ]]<br />
<br />
<br />
nandit@computecanada.ca has read and write access, ipercel@computecanada.ca has read only access.<br />
<br />
The external identity rozmanov@globusid.org has been granted the ability to assist with adding/administering access.<br />
[[File:DmitriFolder2.png|none|thumb|577x577px|please note: the owner of an ARC allocation is accountable for any activities of the individuals with access granted via GLOBUS. ]]<br />
<br />
<br />
rozmanov@globus.org may add research collaborators to this allocation on ARC in order to share data.<br />
<br />
= Special cases =<br />
<br />
== Transferring Large Datasets ==<br />
=== Using screen and rsync ===<br />
If you want to transfer a large amount of data from a remote Unix system to ARC you can use ''<code>rsync</code>'' to handle the transfer. However, you will have to keep your SSH session from your workstation connected during the entire transfer which is often not convenient or not feasible. <br />
<br />
To overcome this one can run the '''rsync''' transfer inside a '''screen''' virtual session on ARC. '''screen''' creates an SSH session local to ARC and allows for reconnection from SSH sessions from your workstation.<br />
<br />
To begin, login to ARC and start <code>screen</code> with the screen command:<br />
<pre><br />
# Login to ARC<br />
$ ssh username@arc.ucalgary.ca<br />
<br />
# Start a screen session<br />
$ screen<br />
<br />
# While in the new screen session, start the transfer with rsync.<br />
$ rsync -axv ext_user@external.system:path/to/remote/data .<br />
<br />
# Disconnect from the screen session with the hotkey 'Ctrl-a d'<br />
# You may now disconnect from ARC or close the lid of you laptop or turn off the computer.<br />
</pre><br />
<br />
To check if the transfer has been finished.<br />
<pre><br />
# Login to ARC<br />
$ ssh username@arc.ucalgary.ca<br />
<br />
# Reconnect to the screen session<br />
$ screen -r<br />
<br />
# If the transfer has been finished close the screen session.<br />
$ exit<br />
<br />
# If the transfer is still running, disconnect from the screen session with the hotkey 'Ctrl-a d'<br />
</pre><br />
<br />
=== Very large files ===<br />
If the files are large and the transfer speed is low the transfer may fail before the file has been transferred. <br />
'''rsync''' may not be of help here, as it will not restart the file transfer (have not tested recently).<br />
<br />
The solution may be to split the large file into smaller chunks, transfer them using rsync and then join them on the remote system (ARC for example):<br />
<br />
<pre><br />
# Large file is 506MB in this example.<br />
$ ls -l t.bin<br />
-rw-r--r-- 1 drozmano drozmano 530308481 Jun 8 11:06 t.bin<br />
<br />
# split the file:<br />
$ split -b 100M t.bin t.bin_chunk.<br />
<br />
# Check the chunks.<br />
$ ls -l t.bin_chunk.*<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.aa<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ab<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ac<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ad<br />
-rw-r--r-- 1 drozmano drozmano 104857600 Jun 8 11:09 t.bin_chunk.ae<br />
-rw-r--r-- 1 drozmano drozmano 6020481 Jun 8 11:09 t.bin_chunk.af<br />
<br />
# Transfer the files:<br />
$ rsync -axv t.bin_chunks.* username@arc.ucalgary.ca:<br />
</pre><br />
<br />
Then login to ARC and join the files:<br />
<pre><br />
$ cat t.bin_chunk.* > t.bin<br />
<br />
$ ls -l <br />
-rw-r--r-- 1 drozmano drozmano 530308481 Jun 8 11:06 t.bin<br />
</pre><br />
Success.<br />
<br />
[[Category:Guides]]</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=File:DmitriFolder2.png&diff=1328File:DmitriFolder2.png2021-05-26T12:25:17Z<p>Tthomas: </p>
<hr />
<div>Dmitri may add collaborators from their research group to the ARC resource at /home/tthomas/dmitriFolder.<br />
<br />
note: tthomas as the owner of the ARC allocation is accountable for any activities of the members added to this folder. Do not grant admin access lightly.</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=File:DmitriFolder.png&diff=1327File:DmitriFolder.png2021-05-26T12:13:43Z<p>Tthomas: </p>
<hr />
<div>test share with Globus on ARC</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=MARC_Cluster_Guide&diff=1323MARC Cluster Guide2021-05-06T21:19:01Z<p>Tthomas: /* Python */</p>
<hr />
<div>{{Message Box<br />
|icon=Security Icon.png<br />
|title=Cybersecurity awareness at the U of C<br />
|message=Please note that there are typically about 950 phishing attempts targeting University of Calgary accounts each month. This is just a reminder to be careful about computer security issues, both at home and at the University. Please visit https://it.ucalgary.ca/it-security for more information, tips on secure computing, and how to report suspected security problems.}}<br />
<br />
{{Message Box<br />
|title=Need Help or have other MARC Related Questions?<br />
|message=For all general RCS related issues, questions, or comments, please contact us at support@hpc.ucalgary.ca.<br />
|icon=Support Icon.png}}<br />
<br />
<br />
This guide gives an overview of the MARC (Medical Advanced Research Computing) cluster at the University of Calgary.<br />
<br />
It is intended to be read by new account holders getting started on MARC, covering such topics as the hardware and performance characteristics, available software, usage policies and how to log in.<br />
<br />
If you are looking for how to login to MARC or how to get an account, please see [[MARC_accounts]]<br />
<br />
== Introduction ==<br />
MARC is a cluster comprised of Linux-based computers purchased in 2019<br />
<br />
The MARC cluster has been designed with controls appropriate for Level 3 and Level 4 classified data. The University of Calgary Information Security Classification Standard is published here: https://www.ucalgary.ca/policies/files/policies/im010-03-security-standard_0.pdf<br />
<br />
Due to security requirements for Level 3/4 data, some necessary restrictions have been placed on MARC to prevent accidental (or otherwise) data exfiltration.<br />
* Compute nodes and login nodes have no access to the internet<br />
* All data must be ingested to MARC by first copying it to SCDS (Secure Compute Data Store) and then fetching it from SCDS to MARC.<br />
* Resulting data (outputs of analyses) must be copyied to SCDS and then fetching it from SCDS to wherever it needs to go using established means.<br />
* All file accesses are recorded for auditing purposes.<br />
* ssh connections to MARC must be through the IT Citrix system (Admin VPN is not sufficient nor necessary)<br />
* All accounts must be IT accounts<br />
* A project ID is required to use MARC. This project ID is the same number that is used on SCDS<br />
<br />
== Hardware ==<br />
MARC has compute nodes of two different varieties: <br />
* 8 GPU (Graphics Processing Unit)-enabled nodes containing:<br />
** 40-cores: each node having 2 sockets. Each socket has an Intel Xeon Gold 6148 20-core processor, running at 2.4 GHz. <br />
** The 40 cores on the individual compute nodes share about 750 GB of RAM (memory) but, jobs should request no more than 753000 MB.<br />
** Two Tesla V100-PCIE-16GB GPUs.<br />
* 1 Bigmem Node<br />
** 80-cores: node with 4 sockets. Each socket has an Intel Xeon Gold 6148 20-core processor, running at 2.4 GHz. <br />
** The 80 cores on the node share about 3 TB of RAM (memory), but, jobs should request no more than 3000000 MB.<br />
<br />
=== cpu2019 ===<br />
Allows non-GPU jobs to use:<br />
* Up to 38 cpus per node <br />
* No gpus.<br />
* Up to 500GB memory<br />
* Are the same <br />
<br />
=== gpu2019 ===<br />
Allows jobs requiring nVidia v100 gpu jobs to use:<br />
* 1 or 2 gpus per node <br />
* Up to 40 cpus per node.<br />
* Up to 750GB memory<br />
<br />
=== bigmem ===<br />
For very large memory jobs:<br />
* Up to 80 cpus<br />
* Up to 3TB memory<br />
* No gpus<br />
<br />
== Storage ==<br />
About a petabyte of raw disk storage is available to the MARC cluster, but for error checking and performance reasons, the amount of usable storage for researchers' projects is considerably less than that. From a user's perspective, the total amount of storage is less important than the individual storage limits. As described below, there are two storage areas: home and project.<br />
<br />
=== Home file system: /home ===<br />
There is a per-user quota of 25 GB under /home. This limit is fixed and cannot be increased. Each user has a directory under /home, which is the default working directory when logging in to MARC. It is expected that most researchers will work from /project and only use home for software and such things. /home is expected to be used only for L1/L2 data and not for your patient identifiable files. The identifiable files go in the appropriate directory under /project.<br />
<br />
=== Project file system for larger projects: /project ===<br />
Directories will be created in /project named after your project ID. This name will be the same as your SCDS share name. The expectation is that all files to do with that project will be stored in /project/projectid. Quotas in /project are somewhat flexible. Please write to support@hpc.ucalgary.ca with an estimate of how much space you will require.<br />
<br />
== Software installations ==<br />
<br />
=== Python ===<br />
<br />
There are some complications in using Python on MARC relative to using ARC. <br />
Normally, we would recommend installing conda in user's home directory. <br />
On MARC, security requirements for working with L4 data require that we block outgoing and incoming internet connections. <br />
As a result, new packages cannot be downloaded with conda. <br />
<br />
Depending on what you need, the two recommendations we can make are<br />
<br />
* Download the standard anaconda distribution from the anaconda website to a personal computer: https://repo.anaconda.com/archive/Anaconda3-2020.07-Linux-x86_64.sh<br />
** Transfer the script to MARC via SCDS<br />
** Copy it to your /home directory<br />
** Install it in your home directory with $bash Anaconda3-2020.07-Linux-x86_64.sh<br />
** you will be asked to agree to a license agreement and to confirm that you wish to create a folder anaconda3 once the installation completes, you will have a new directory under your home directory ~/anaconda3. In order to make it possible to use the local conda instance you will need to change the system path to include your local python directories $ export PATH=~/anaconda3/bin:$PATH<br />
* Download a docker container with the software that you need including python (e.g. tensorflow-gpu)<br />
** Transfer the docker container to MARC via SCDS<br />
** Copy it to your /home directory<br />
** Run it with singularity<br />
<br />
<br />
* Non-open source software which requires a connection to a license server may require admin assistance to set up. contact [mailto:support@hpc.ucalgary.ca support@hpc.ucalgary.ca] for support.<br /><br />
<br />
== Further Reading ==<br />
See [[Running_jobs]] for information on starting a job<br />
<br />
[[Category:MARC]]<br />
[[Category:Guides]]</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=MARC_Cluster_Guide&diff=1322MARC Cluster Guide2021-05-06T21:17:46Z<p>Tthomas: /* Software installations */</p>
<hr />
<div>{{Message Box<br />
|icon=Security Icon.png<br />
|title=Cybersecurity awareness at the U of C<br />
|message=Please note that there are typically about 950 phishing attempts targeting University of Calgary accounts each month. This is just a reminder to be careful about computer security issues, both at home and at the University. Please visit https://it.ucalgary.ca/it-security for more information, tips on secure computing, and how to report suspected security problems.}}<br />
<br />
{{Message Box<br />
|title=Need Help or have other MARC Related Questions?<br />
|message=For all general RCS related issues, questions, or comments, please contact us at support@hpc.ucalgary.ca.<br />
|icon=Support Icon.png}}<br />
<br />
<br />
This guide gives an overview of the MARC (Medical Advanced Research Computing) cluster at the University of Calgary.<br />
<br />
It is intended to be read by new account holders getting started on MARC, covering such topics as the hardware and performance characteristics, available software, usage policies and how to log in.<br />
<br />
If you are looking for how to login to MARC or how to get an account, please see [[MARC_accounts]]<br />
<br />
== Introduction ==<br />
MARC is a cluster comprised of Linux-based computers purchased in 2019<br />
<br />
The MARC cluster has been designed with controls appropriate for Level 3 and Level 4 classified data. The University of Calgary Information Security Classification Standard is published here: https://www.ucalgary.ca/policies/files/policies/im010-03-security-standard_0.pdf<br />
<br />
Due to security requirements for Level 3/4 data, some necessary restrictions have been placed on MARC to prevent accidental (or otherwise) data exfiltration.<br />
* Compute nodes and login nodes have no access to the internet<br />
* All data must be ingested to MARC by first copying it to SCDS (Secure Compute Data Store) and then fetching it from SCDS to MARC.<br />
* Resulting data (outputs of analyses) must be copyied to SCDS and then fetching it from SCDS to wherever it needs to go using established means.<br />
* All file accesses are recorded for auditing purposes.<br />
* ssh connections to MARC must be through the IT Citrix system (Admin VPN is not sufficient nor necessary)<br />
* All accounts must be IT accounts<br />
* A project ID is required to use MARC. This project ID is the same number that is used on SCDS<br />
<br />
== Hardware ==<br />
MARC has compute nodes of two different varieties: <br />
* 8 GPU (Graphics Processing Unit)-enabled nodes containing:<br />
** 40-cores: each node having 2 sockets. Each socket has an Intel Xeon Gold 6148 20-core processor, running at 2.4 GHz. <br />
** The 40 cores on the individual compute nodes share about 750 GB of RAM (memory) but, jobs should request no more than 753000 MB.<br />
** Two Tesla V100-PCIE-16GB GPUs.<br />
* 1 Bigmem Node<br />
** 80-cores: node with 4 sockets. Each socket has an Intel Xeon Gold 6148 20-core processor, running at 2.4 GHz. <br />
** The 80 cores on the node share about 3 TB of RAM (memory), but, jobs should request no more than 3000000 MB.<br />
<br />
=== cpu2019 ===<br />
Allows non-GPU jobs to use:<br />
* Up to 38 cpus per node <br />
* No gpus.<br />
* Up to 500GB memory<br />
* Are the same <br />
<br />
=== gpu2019 ===<br />
Allows jobs requiring nVidia v100 gpu jobs to use:<br />
* 1 or 2 gpus per node <br />
* Up to 40 cpus per node.<br />
* Up to 750GB memory<br />
<br />
=== bigmem ===<br />
For very large memory jobs:<br />
* Up to 80 cpus<br />
* Up to 3TB memory<br />
* No gpus<br />
<br />
== Storage ==<br />
About a petabyte of raw disk storage is available to the MARC cluster, but for error checking and performance reasons, the amount of usable storage for researchers' projects is considerably less than that. From a user's perspective, the total amount of storage is less important than the individual storage limits. As described below, there are two storage areas: home and project.<br />
<br />
=== Home file system: /home ===<br />
There is a per-user quota of 25 GB under /home. This limit is fixed and cannot be increased. Each user has a directory under /home, which is the default working directory when logging in to MARC. It is expected that most researchers will work from /project and only use home for software and such things. /home is expected to be used only for L1/L2 data and not for your patient identifiable files. The identifiable files go in the appropriate directory under /project.<br />
<br />
=== Project file system for larger projects: /project ===<br />
Directories will be created in /project named after your project ID. This name will be the same as your SCDS share name. The expectation is that all files to do with that project will be stored in /project/projectid. Quotas in /project are somewhat flexible. Please write to support@hpc.ucalgary.ca with an estimate of how much space you will require.<br />
<br />
== Software installations ==<br />
<br />
=== Python ===<br />
<br />
There are some complications in using Python on MARC relative to using ARC. <br />
Normally, we would recommend installing conda in user's home directory. <br />
On MARC, security requirements for working with L4 data require that we block outgoing and incoming internet connections. <br />
As a result, new packages cannot be downloaded with conda. <br />
<br />
Depending on what you need, the two recommendations we can make are<br />
<br />
* Download the standard anaconda distribution from the anaconda website to a personal computer: https://repo.anaconda.com/archive/Anaconda3-2020.07-Linux-x86_64.sh<br />
** Transfer the script to MARC via SCDS<br />
** Copy it to your /home directory<br />
** Install it in your home directory with $bash Anaconda3-2020.07-Linux-x86_64.sh<br />
** you will be asked to agree to a license agreement and to confirm that you wish to create a folder anaconda3 once the installation completes, you will have a new directory under your home directory ~/anaconda3. In order to make it possible to use the local conda instance you will need to change the system path to include your local python directories $ export PATH=~/anaconda3/bin:$PATH<br />
<br />
<br />
* Non-open source software which requires a connection to a license server may require admin assistance to set up. contact [mailto:support@hpc.ucalgary.ca support@hpc.ucalgary.ca] for support.<br /><br />
* Download a docker container with the software that you need including python (e.g. tensorflow-gpu)<br />
** Transfer the docker container to MARC via SCDS<br />
** Copy it to your /home directory<br />
** Run it with singularity<br />
<br />
== Further Reading ==<br />
See [[Running_jobs]] for information on starting a job<br />
<br />
[[Category:MARC]]<br />
[[Category:Guides]]</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=MARC_Cluster_Guide&diff=1321MARC Cluster Guide2021-05-06T21:16:45Z<p>Tthomas: spacing, cosmetic</p>
<hr />
<div>{{Message Box<br />
|icon=Security Icon.png<br />
|title=Cybersecurity awareness at the U of C<br />
|message=Please note that there are typically about 950 phishing attempts targeting University of Calgary accounts each month. This is just a reminder to be careful about computer security issues, both at home and at the University. Please visit https://it.ucalgary.ca/it-security for more information, tips on secure computing, and how to report suspected security problems.}}<br />
<br />
{{Message Box<br />
|title=Need Help or have other MARC Related Questions?<br />
|message=For all general RCS related issues, questions, or comments, please contact us at support@hpc.ucalgary.ca.<br />
|icon=Support Icon.png}}<br />
<br />
<br />
This guide gives an overview of the MARC (Medical Advanced Research Computing) cluster at the University of Calgary.<br />
<br />
It is intended to be read by new account holders getting started on MARC, covering such topics as the hardware and performance characteristics, available software, usage policies and how to log in.<br />
<br />
If you are looking for how to login to MARC or how to get an account, please see [[MARC_accounts]]<br />
<br />
== Introduction ==<br />
MARC is a cluster comprised of Linux-based computers purchased in 2019<br />
<br />
The MARC cluster has been designed with controls appropriate for Level 3 and Level 4 classified data. The University of Calgary Information Security Classification Standard is published here: https://www.ucalgary.ca/policies/files/policies/im010-03-security-standard_0.pdf<br />
<br />
Due to security requirements for Level 3/4 data, some necessary restrictions have been placed on MARC to prevent accidental (or otherwise) data exfiltration.<br />
* Compute nodes and login nodes have no access to the internet<br />
* All data must be ingested to MARC by first copying it to SCDS (Secure Compute Data Store) and then fetching it from SCDS to MARC.<br />
* Resulting data (outputs of analyses) must be copyied to SCDS and then fetching it from SCDS to wherever it needs to go using established means.<br />
* All file accesses are recorded for auditing purposes.<br />
* ssh connections to MARC must be through the IT Citrix system (Admin VPN is not sufficient nor necessary)<br />
* All accounts must be IT accounts<br />
* A project ID is required to use MARC. This project ID is the same number that is used on SCDS<br />
<br />
== Hardware ==<br />
MARC has compute nodes of two different varieties: <br />
* 8 GPU (Graphics Processing Unit)-enabled nodes containing:<br />
** 40-cores: each node having 2 sockets. Each socket has an Intel Xeon Gold 6148 20-core processor, running at 2.4 GHz. <br />
** The 40 cores on the individual compute nodes share about 750 GB of RAM (memory) but, jobs should request no more than 753000 MB.<br />
** Two Tesla V100-PCIE-16GB GPUs.<br />
* 1 Bigmem Node<br />
** 80-cores: node with 4 sockets. Each socket has an Intel Xeon Gold 6148 20-core processor, running at 2.4 GHz. <br />
** The 80 cores on the node share about 3 TB of RAM (memory), but, jobs should request no more than 3000000 MB.<br />
<br />
=== cpu2019 ===<br />
Allows non-GPU jobs to use:<br />
* Up to 38 cpus per node <br />
* No gpus.<br />
* Up to 500GB memory<br />
* Are the same <br />
<br />
=== gpu2019 ===<br />
Allows jobs requiring nVidia v100 gpu jobs to use:<br />
* 1 or 2 gpus per node <br />
* Up to 40 cpus per node.<br />
* Up to 750GB memory<br />
<br />
=== bigmem ===<br />
For very large memory jobs:<br />
* Up to 80 cpus<br />
* Up to 3TB memory<br />
* No gpus<br />
<br />
== Storage ==<br />
About a petabyte of raw disk storage is available to the MARC cluster, but for error checking and performance reasons, the amount of usable storage for researchers' projects is considerably less than that. From a user's perspective, the total amount of storage is less important than the individual storage limits. As described below, there are two storage areas: home and project.<br />
<br />
=== Home file system: /home ===<br />
There is a per-user quota of 25 GB under /home. This limit is fixed and cannot be increased. Each user has a directory under /home, which is the default working directory when logging in to MARC. It is expected that most researchers will work from /project and only use home for software and such things. /home is expected to be used only for L1/L2 data and not for your patient identifiable files. The identifiable files go in the appropriate directory under /project.<br />
<br />
=== Project file system for larger projects: /project ===<br />
Directories will be created in /project named after your project ID. This name will be the same as your SCDS share name. The expectation is that all files to do with that project will be stored in /project/projectid. Quotas in /project are somewhat flexible. Please write to support@hpc.ucalgary.ca with an estimate of how much space you will require.<br />
<br />
== Software installations ==<br />
<br />
=== Python ===<br />
<br />
There are some complications in using Python on MARC relative to using ARC. <br />
Normally, we would recommend installing conda in user's home directory. <br />
On MARC, security requirements for working with L4 data require that we block outgoing and incoming internet connections. <br />
As a result, new packages cannot be downloaded with conda. <br />
<br />
Depending on what you need, the two recommendations we can make are<br />
<br />
* Download the standard anaconda distribution from the anaconda website to a personal computer: https://repo.anaconda.com/archive/Anaconda3-2020.07-Linux-x86_64.sh<br />
** Transfer the script to MARC via SCDS<br />
** Copy it to your /home directory<br />
** Install it in your home directory with $bash Anaconda3-2020.07-Linux-x86_64.sh<br />
** you will be asked to agree to a license agreement and to confirm that you wish to create a folder anaconda3 once the installation completes, you will have a new directory under your home directory ~/anaconda3. In order to make it possible to use the local conda instance you will need to change the system path to include your local python directories $ export PATH=~/anaconda3/bin:$PATH<br />
* Non-open source software which requires a connection to a license server may require admin assistance to set up. contact [mailto:support@hpc.ucalgary.ca support@hpc.ucalgary.ca] for support.<br /><br />
* Download a docker container with the software that you need including python (e.g. tensorflow-gpu)<br />
** Transfer the docker container to MARC via SCDS<br />
** Copy it to your /home directory<br />
** Run it with singularity<br />
<br />
== Further Reading ==<br />
See [[Running_jobs]] for information on starting a job<br />
<br />
[[Category:MARC]]<br />
[[Category:Guides]]</div>Tthomashttps://rcs.ucalgary.ca/index.php?title=MARC_Cluster_Guide&diff=1320MARC Cluster Guide2021-05-06T21:16:02Z<p>Tthomas: /* Software installations */</p>
<hr />
<div>{{Message Box<br />
|icon=Security Icon.png<br />
|title=Cybersecurity awareness at the U of C<br />
|message=Please note that there are typically about 950 phishing attempts targeting University of Calgary accounts each month. This is just a reminder to be careful about computer security issues, both at home and at the University. Please visit https://it.ucalgary.ca/it-security for more information, tips on secure computing, and how to report suspected security problems.}}<br />
<br />
{{Message Box<br />
|title=Need Help or have other MARC Related Questions?<br />
|message=For all general RCS related issues, questions, or comments, please contact us at support@hpc.ucalgary.ca.<br />
|icon=Support Icon.png}}<br />
<br />
<br />
This guide gives an overview of the MARC (Medical Advanced Research Computing) cluster at the University of Calgary.<br />
<br />
It is intended to be read by new account holders getting started on MARC, covering such topics as the hardware and performance characteristics, available software, usage policies and how to log in.<br />
<br />
If you are looking for how to login to MARC or how to get an account, please see [[MARC_accounts]]<br />
<br />
== Introduction ==<br />
MARC is a cluster comprised of Linux-based computers purchased in 2019<br />
<br />
The MARC cluster has been designed with controls appropriate for Level 3 and Level 4 classified data. The University of Calgary Information Security Classification Standard is published here: https://www.ucalgary.ca/policies/files/policies/im010-03-security-standard_0.pdf<br />
<br />
Due to security requirements for Level 3/4 data, some necessary restrictions have been placed on MARC to prevent accidental (or otherwise) data exfiltration.<br />
* Compute nodes and login nodes have no access to the internet<br />
* All data must be ingested to MARC by first copying it to SCDS (Secure Compute Data Store) and then fetching it from SCDS to MARC.<br />
* Resulting data (outputs of analyses) must be copyied to SCDS and then fetching it from SCDS to wherever it needs to go using established means.<br />
* All file accesses are recorded for auditing purposes.<br />
* ssh connections to MARC must be through the IT Citrix system (Admin VPN is not sufficient nor necessary)<br />
* All accounts must be IT accounts<br />
* A project ID is required to use MARC. This project ID is the same number that is used on SCDS<br />
<br />
== Hardware ==<br />
MARC has compute nodes of two different varieties: <br />
* 8 GPU (Graphics Processing Unit)-enabled nodes containing:<br />
** 40-cores: each node having 2 sockets. Each socket has an Intel Xeon Gold 6148 20-core processor, running at 2.4 GHz. <br />
** The 40 cores on the individual compute nodes share about 750 GB of RAM (memory) but, jobs should request no more than 753000 MB.<br />
** Two Tesla V100-PCIE-16GB GPUs.<br />
* 1 Bigmem Node<br />
** 80-cores: node with 4 sockets. Each socket has an Intel Xeon Gold 6148 20-core processor, running at 2.4 GHz. <br />
** The 80 cores on the node share about 3 TB of RAM (memory), but, jobs should request no more than 3000000 MB.<br />
<br />
=== cpu2019 ===<br />
Allows non-GPU jobs to use:<br />
* Up to 38 cpus per node <br />
* No gpus.<br />
* Up to 500GB memory<br />
* Are the same <br />
<br />
=== gpu2019 ===<br />
Allows jobs requiring nVidia v100 gpu jobs to use:<br />
* 1 or 2 gpus per node <br />
* Up to 40 cpus per node.<br />
* Up to 750GB memory<br />
<br />
=== bigmem ===<br />
For very large memory jobs:<br />
* Up to 80 cpus<br />
* Up to 3TB memory<br />
* No gpus<br />
<br />
== Storage ==<br />
About a petabyte of raw disk storage is available to the MARC cluster, but for error checking and performance reasons, the amount of usable storage for researchers' projects is considerably less than that. From a user's perspective, the total amount of storage is less important than the individual storage limits. As described below, there are two storage areas: home and project.<br />
<br />
=== Home file system: /home ===<br />
There is a per-user quota of 25 GB under /home. This limit is fixed and cannot be increased. Each user has a directory under /home, which is the default working directory when logging in to MARC. It is expected that most researchers will work from /project and only use home for software and such things. /home is expected to be used only for L1/L2 data and not for your patient identifiable files. The identifiable files go in the appropriate directory under /project.<br />
<br />
=== Project file system for larger projects: /project ===<br />
Directories will be created in /project named after your project ID. This name will be the same as your SCDS share name. The expectation is that all files to do with that project will be stored in /project/projectid. Quotas in /project are somewhat flexible. Please write to support@hpc.ucalgary.ca with an estimate of how much space you will require.<br />
<br />
== Software installations ==<br />
<br />
=== Python ===<br />
<br />
There are some complications in using Python on MARC relative to using ARC. <br />
Normally, we would recommend installing conda in user's home directory. <br />
On MARC, security requirements for working with L4 data require that we block outgoing and incoming internet connections. <br />
As a result, new packages cannot be downloaded with conda. <br />
<br />
Depending on what you need, the two recommendations we can make are<br />
<br />
* Download the standard anaconda distribution from the anaconda website to a personal computer: https://repo.anaconda.com/archive/Anaconda3-2020.07-Linux-x86_64.sh<br />
** Transfer the script to MARC via SCDS<br />
** Copy it to your /home directory<br />
** Install it in your home directory with $bash Anaconda3-2020.07-Linux-x86_64.sh<br />
** you will be asked to agree to a license agreement and to confirm that you wish to create a folder anaconda3 once the installation completes, you will have a new directory under your home directory ~/anaconda3. In order to make it possible to use the local conda instance you will need to change the system path to include your local python directories $ export PATH=~/anaconda3/bin:$PATH<br />
<br />
* Non-open source software which requires a connection to a license server may require admin assistance to set up. contact [mailto:support@hpc.ucalgary.ca support@hpc.ucalgary.ca] for support.<br /><br />
* Download a docker container with the software that you need including python (e.g. tensorflow-gpu)<br />
** Transfer the docker container to MARC via SCDS<br />
** Copy it to your /home directory<br />
** Run it with singularity<br />
<br />
== Further Reading ==<br />
See [[Running_jobs]] for information on starting a job<br />
<br />
[[Category:MARC]]<br />
[[Category:Guides]]</div>Tthomas