RCS Summer School 2024: Difference between revisions
Line 156: | Line 156: | ||
|This follow-up workshop comes immediately after the Introduction to Linux session. We will build on what we learned in the previous session and go into details on how to use the HPC cluster using the Slurm scheduler. | |This follow-up workshop comes immediately after the Introduction to Linux session. We will build on what we learned in the previous session and go into details on how to use the HPC cluster using the Slurm scheduler. | ||
This workshop will provide you with the skills necessary to write a simple Slurm batch script, submit jobs to Slurm, view and manage your jobs. By the end of the course, you will be familiar with what Slurm is, how it fits in in a HPC environment, and how to start using Slurm on our HPC clusters for your research. | This workshop will provide you with the skills necessary to write a simple Slurm batch script, submit jobs to Slurm, view and manage your jobs. By the end of the course, you will be familiar with what Slurm is, how it fits in in a HPC environment, and how to start using Slurm on our HPC clusters for your research. | ||
This is a beginner friendly workshop. You should be familiar with the Linux command line. We recommend bringing your own device to follow along. | This is a beginner friendly workshop. You should be familiar with the Linux command line. We recommend bringing your own device to follow along. | ||
Line 202: | Line 203: | ||
Content to be covered in the two-part session includes: | Content to be covered in the two-part session includes: | ||
* | * Dataset basics, | ||
* | * Capturing data-provenance, and | ||
* | * Collaborative data analysis. | ||
Although no git knowledge is required, familiarity with git is strongly advised. Command line experience is required. | Background content will be covered before conducting the primary hands-on training where attendees will create a small demonstrative research project containing data provenance. Although no git knowledge is required, familiarity with git is strongly advised. Command line experience is required. | ||
* '''Speaker:''' David Deepwell and Pedro Martinez | * '''Speaker:''' David Deepwell and Pedro Martinez | ||
* '''Format:''' Lecture + Hands on | |||
* '''Level:''' Introductory | * '''Level:''' Introductory | ||
* '''Prerequisites:''' Command line experience | * '''Prerequisites:''' Command line experience, Familiarity with Git | ||
|- | |- | ||
! | ! | ||
Line 215: | Line 216: | ||
|June 11, 9:30AM - 10:20AM | |June 11, 9:30AM - 10:20AM | ||
ICT 102 | ICT 102 | ||
| | |This session is a primer for those new to high performance computing (HPC) or computing on remote resources. We will build on the foundations built from our previous Linux and Slurm introductory sessions and expand on the larger picture, including: | ||
* Motivation for using HPC | |||
* Finding available resources on HPC resources | |||
* Issues and pitfalls to avoid (such as incorrect job resource requests) | |||
* Troubleshooting job failures | |||
* High level overview of parallel programming with Slurm | |||
* How to transfer data to/from other institutionsThis is a beginner friendly workshop. You should be familiar with the Linux command line. We recommend bringing your own device to follow along. | |||
* '''Speaker:''' Robert Fridman, Dave Schulz | * '''Speaker:''' Robert Fridman, Dave Schulz | ||
* '''Format:''' Lecture | |||
* '''Level:''' Introductory | * '''Level:''' Introductory | ||
* '''Prerequisites:''' | * '''Prerequisites:''' Linux command line, Slurm | ||
|- | |- | ||
! | ! | ||
Line 225: | Line 234: | ||
|June 11, 10:30AM - 11:20AM | |June 11, 10:30AM - 11:20AM | ||
ICT 102 | ICT 102 | ||
| | |This session introduces more intermediate to advanced uses of the Linux environment for handling large data sets. The course will demonstrate the power of shell pipes and how you can work with large datasets with just standard Linux tools and utilities that is built-in to the system. | ||
We will cover some common use cases including: | |||
* How to download large datasets from the Internet | |||
* How to parsing text-based data using tools such as sed, awk, grep | |||
* How to build powerful text mining, conversion, and visualization with just the command line | |||
* '''Speaker:''' | This is an intermediate course. You should be familiar with the Linux command line and some common Linux utilities prior to the course. Some understanding of regular expressions may be useful. | ||
* '''Level:''' Introductory | * '''Speaker:''' Leo Leung, Dave Schulz | ||
* '''Format:''' Lecture | |||
* '''Level:''' Introductory to Intermediate | |||
* '''Prerequisites:''' Command line experience | * '''Prerequisites:''' Command line experience | ||
|- | |- | ||
Line 235: | Line 251: | ||
|June 11, 11:30AM - 12:00PM | |June 11, 11:30AM - 12:00PM | ||
ICT 102 | ICT 102 | ||
| | |This is a general question and answers period where you may ask the Research Computing Services team questions related to RCS and HPC. You may ask both technical and non-technical questions. | ||
* '''Speaker:''' The RCS team | * '''Speaker:''' The RCS team | ||
* '''Format:''' Question & Answer period | |||
* '''Level:''' Introductory | * '''Level:''' Introductory | ||
* '''Prerequisites:''' None | * '''Prerequisites:''' None | ||
Line 245: | Line 262: | ||
|June 11, 1:00PM - 2:20PM | |June 11, 1:00PM - 2:20PM | ||
ICT 102 | ICT 102 | ||
|Managing your digital files and research materials is critical for keeping yourself organized, collaborating, and communicating with colleagues. In this session, we will cover Research Data Management (RDM) and Data Management Plan (DMP). We will also go over best practices in digital file management depending on your individual and organizational needs. This presentation will also discuss best practices, versioning, and how to document and share your file and folder convention using a README file. | |Managing your digital files and research materials is critical for keeping yourself organized, collaborating, and communicating with colleagues. In this session, we will cover Research Data Management (RDM) and Data Management Plan (DMP). We will also go over best practices in digital file management depending on your individual and organizational needs. | ||
This presentation will also discuss best practices, versioning, and how to document and share your file and folder convention using a README file. | |||
By the end of this session, you should be familiar with RDM and DMP concepts to help keep your research materials organized. | |||
* '''Speaker:''' Jennifer Abel, Alex Thistlewood, and Ingrid Reiche (from The University of Calgary Libraries and Cultural Resources) | * '''Speaker:''' Jennifer Abel, Alex Thistlewood, and Ingrid Reiche (from The University of Calgary Libraries and Cultural Resources) | ||
* '''Format:''' Lecture | |||
* '''Level:''' Introductory | * '''Level:''' Introductory | ||
* '''Prerequisites:''' None | * '''Prerequisites:''' None | ||
Line 255: | Line 275: | ||
|June 11, 2:30PM - 3:20PM | |June 11, 2:30PM - 3:20PM | ||
ICT 102 | ICT 102 | ||
| | |Reproducible research workflows is essential for repeatability. This session will cover the basics of using containers with Apptainer, a secure container technology designed to be used on for high performance compute clusters. We will cover: | ||
* How to use Apptainer to run a containerized environment | |||
* How to build Apptainer containers | |||
* How to deploy software inside Apptainer containers | |||
* How to use Apptainer containers with your Slurm job submissions. | |||
The instructor for this session will be remote and will be streamed in ICT 102. We will provide a zoom link for those who wishes to attend virtually. | |||
* '''Speaker:''' Tannistha Nandi | * '''Speaker:''' Tannistha Nandi | ||
* '''Format:''' Lecture + Hands on | |||
* '''Level:''' Introductory | * '''Level:''' Introductory | ||
* '''Prerequisites:''' None | * '''Prerequisites:''' None | ||
Line 278: | Line 305: | ||
* '''Speaker:''' David Deepwell and Pedro Martinez | * '''Speaker:''' David Deepwell and Pedro Martinez | ||
* '''Format:''' Lecture + Hands on | |||
* '''Level:''' Introductory | * '''Level:''' Introductory | ||
* '''Prerequisites:''' None | * '''Prerequisites:''' None | ||
Line 289: | Line 317: | ||
* '''Speaker:''' AWS | * '''Speaker:''' AWS | ||
* '''Format:''' Lecture | |||
* '''Level:''' Introductory | * '''Level:''' Introductory | ||
* '''Prerequisites:''' None | * '''Prerequisites:''' None | ||
Line 299: | Line 328: | ||
* '''Speaker:''' AWS | * '''Speaker:''' AWS | ||
* '''Format:''' Lecture | |||
* '''Level:''' Introductory | * '''Level:''' Introductory | ||
* '''Prerequisites:''' None | * '''Prerequisites:''' None | ||
Line 314: | Line 344: | ||
* '''Speaker:''' AWS | * '''Speaker:''' AWS | ||
* '''Format:''' Workshop + Hands on | |||
* '''Level:''' Introductory | * '''Level:''' Introductory | ||
* '''Prerequisites:''' None | * '''Prerequisites:''' None | ||
Line 322: | Line 353: | ||
|June 12, 9:30AM - 12:20AM | |June 12, 9:30AM - 12:20AM | ||
ICT 102 | ICT 102 | ||
| | |TBD | ||
* '''Speaker:''' Jonathan Dursi from NVIDIA | * '''Speaker:''' Jonathan Dursi from NVIDIA | ||
* '''Format:''' Lecture | |||
* '''Level:''' Introductory | * '''Level:''' Introductory | ||
* '''Prerequisites:''' None | * '''Prerequisites:''' None | ||
Line 336: | Line 368: | ||
* '''Speaker:''' Rob Lucas from Dell | * '''Speaker:''' Rob Lucas from Dell | ||
* '''Format:''' Lecture | |||
* '''Level:''' Introductory | * '''Level:''' Introductory | ||
* '''Prerequisites:''' None | * '''Prerequisites:''' None |
Revision as of 23:10, 16 May 2024
Research Computing Services' 3rd annual summer school will run from Monday, June 10 through to Wednesday, June 12, 2024 from 9AM to 5PM. This summer school consists of various sessions and workshops throughout these 3 days and is completely free to all University of Calgary members.
Our goal for this year's summer school is to Empower our researchers: Inspiring what is possible on HPC infrastructure.
Registration
Registration is required to attend the RCS Summer School sessions. Registration is free to all members of the University of Calgary.
There will be a limit of approximately 100 seats. If you are unable to attend after registering, please cancel/modify your registration or notify us via email.
Topics
- Introduction to RCS services and HPC resources
- Introduction to Linux & Bash command line
- Using Linux utilities for large datasets
- Hands on with Linux & Slurm: Workshop
- Using Open OnDemand on ARC
- Develop a research data management plan
- Reproducible data management with Datalad
- Digital File Management
- Using containers in HPC with Apptainer
- Managing scientific software with Conda
- Research workflow development with Prefect
- AWS: ML in the Cloud, a walkthrough followed by a workshop
- NVIDIA: Workflow optimization using NVIDIA GPUs
- Dell & AMD: Machine learning with Dell and AMD
Schedule
The summer school sessions will be held in ICT 102 and ICT 114. Refreshments will be available in ICT 114 on all 3 days.
Time | June 10 | June 11 | June 12 | |||
---|---|---|---|---|---|---|
Track 1 | Track 2 | Track 1 | Track 2 | Track 1 | Track 2 | |
8:30 AM | Registration & check-in ICT 102 |
Registration & check-in ICT 102 |
Registration & check-in ICT 102 | |||
9:00 AM | Introduction to RCS ICT 102, 9:00 AM - 9:20 AM Jill Kowalchuk |
Refreshments ICT 114 |
The Alliance: Introduction ICT 102 Brock Kahanyshyn |
Refreshments ICT 114 |
TBD
ICT 102 |
Refreshments ICT 114 |
9:30 AM | Introduction to Linux, Bash, and the command line ICT 102, 9:30 AM - 10:30 AM Robert Fridman |
Data in Motion: Navigating Storage Solutions for Active Research Data ICT 114, 9:30 AM - 11:20 AM Ian Percel |
Introduction to HPC resources ICT 102, 9:30 AM - 10:20 AM Robert Fridman, Dave Schulz |
Reproducible Data Management with Datalad: Part II ICT 114, 9:30 AM - 10:20 AM David Deepwell, Pedro Martinez |
NVIDIA: Workflow Optimization with NVIDIA GPUs ICT 102, 9:30 AM - 12:00 PM Jonathan Dursi | |
10:00 AM | Refreshments ICT 114 | |||||
10:30 AM | Workshop: Hands on with Linux & Slurm ICT 102, 10:30 AM - 11:50 AM Robert Fridman |
Linux tools & utilities for working with large data sets ICT 102, 10:30 AM - 11:20 AM Leo Leung, Dave Schulz | ||||
11:00 AM | ||||||
11:30 AM | Reproducible Data Management with Datalad: Part I ICT 114, 11:30 AM - 12:20 AM David Deepwell, Pedro Martinez |
RCS Q&A period: Ask RCS anything ICT 102, 11:30 AM - 12:00 PM RCS Team | ||||
12:00 PM | Open OnDemand on ARC ICT 102, 12:00 AM - 12:20 AM Leo Leung |
Lunch break 12:00 PM - 1:00 PM |
Lunch break 12:00 PM - 1:00 PM | |||
12:30 PM | Lunch break 12:30 PM - 1:30 PM | |||||
1:00 PM | Research Data Management and Data File Management ICT 102, 1:00 PM - 2:20 PM Jennifer Abel, Alex Thistlewood, Ingrid Reiche |
Refreshments ICT 114 |
Dell & AMD: Machine learning with Dell & AMD ICT 102, 1:00 PM - 1:50 PM Rob Lucas | |||
1:30 PM | AWS: Inspiring the art of the possible ICT 102, 1:30 PM - 1:50 PM AWS |
Refreshments ICT 114 | ||||
2:00 PM | AWS: How AWS works with Researchers ICT 102, 2:00 PM - 2:20 PM AWS | |||||
2:30 PM | AWS: Machine Learning with low-code workshop ICT 102, 2:30 PM - 4:50 PM AWS |
Introduction to containers with Apptainer ICT 102, 2:30 PM - 3:20 PM Tannistha Nandi |
Prefect for Research Workflow Development ICT 102, 2:30 PM - 3:50 PM David Deepwell, Pedro Martinez | |||
3:00 PM | ||||||
3:30 PM | Managing scientific software with Conda ICT 102, 3:30 PM - 4:20 PM Dmitri Rozmanov | |||||
4:00 PM | End of day: 4:00 PM | |||||
4:30 PM | End of day: 4:30 PM | |||||
5:00 PM | End of day: 5:00 PM |
Sessions
Session | Time and Location | Synopsis |
---|---|---|
Introduction to RCS |
June 10, 9:00AM - 9:20AM
ICT 102 |
We will begin the RCS summer school with a quick introduction by Jill Kowalchuk, the Interim director of Research Computing Services. We will introduce the RCS team, provide a high level overview of our services, and how to get help and support from our analysts.
|
Introduction to Linux, Bash, and the command line |
June 10, 9:30AM - 10:30AM
ICT 102 |
This course provides you with essential skills to effectively use the Linux command line. We will go over from ground up how to log-in and interact with our HPC cluster, traverse the filesystem, execute programs, and manage files.
This beginner friendly session requires no prior experience to Linux. We recommend bringing your own device to follow along. By the end of the course, you should be familiar with what is possible with the Linux command line.
|
Workshop: Hands on with Linux & Slurm |
June 10, 10:30AM - 11:50 AM
ICT 102 |
This follow-up workshop comes immediately after the Introduction to Linux session. We will build on what we learned in the previous session and go into details on how to use the HPC cluster using the Slurm scheduler.
This workshop will provide you with the skills necessary to write a simple Slurm batch script, submit jobs to Slurm, view and manage your jobs. By the end of the course, you will be familiar with what Slurm is, how it fits in in a HPC environment, and how to start using Slurm on our HPC clusters for your research. This is a beginner friendly workshop. You should be familiar with the Linux command line. We recommend bringing your own device to follow along.
|
Open OnDemand on ARC |
June 10, 12:00 AM - 12:20 AM
ICT 102 |
Did you know you can run a Linux desktop and graphical tools on ARC? This session will cover what ARC Open OnDemand is and how it may help with your research. We will show you how to:
By the end of this session, you will be familiar with the options available on Open OnDemand and be able to start graphical sessions through this service. This is a beginner friendly workshop and no prior experience is necessary. We recommend bringing your own device to follow along.
|
|
June 10, 9:30AM - 11:20AM
ICT 114, Track 2 |
Planning for and requesting specialized storage for large research projects can be a daunting proposition. The variety of storage options and the expected justifications for allocations locally to UCalgary, at national supercomputing sites, and in the public cloud can quickly become overwhelming. This talk aims to provide an introduction to the cost/benefit tradeoff in using different storage systems, when to reach out to different support services around the university for help in making critical decisions, and basic techniques for providing a quantitative justification for a storage request.
By the end of the session, you will be familiar with the types of storage related questions that should be answered when tackling large research projects and the different types of solutions that the University offers our researchers.
|
Reproducible Data Management with Datalad |
June 10, 10:30AM - 11:20AM
June 11, 9:30AM - 10:20AM ICT 114, Track 2 |
Data management and research data is critical to research. This is a two part workshop that introduces you to DataLad, a digital data management system based on the Git version control system.
Content to be covered in the two-part session includes:
Background content will be covered before conducting the primary hands-on training where attendees will create a small demonstrative research project containing data provenance. Although no git knowledge is required, familiarity with git is strongly advised. Command line experience is required.
|
Introduction to HPC resources |
June 11, 9:30AM - 10:20AM
ICT 102 |
This session is a primer for those new to high performance computing (HPC) or computing on remote resources. We will build on the foundations built from our previous Linux and Slurm introductory sessions and expand on the larger picture, including:
|
Linux tools & utilities for working with large data sets |
June 11, 10:30AM - 11:20AM
ICT 102 |
This session introduces more intermediate to advanced uses of the Linux environment for handling large data sets. The course will demonstrate the power of shell pipes and how you can work with large datasets with just standard Linux tools and utilities that is built-in to the system.
We will cover some common use cases including:
This is an intermediate course. You should be familiar with the Linux command line and some common Linux utilities prior to the course. Some understanding of regular expressions may be useful.
|
RCS Q&A period: Ask RCS anything |
June 11, 11:30AM - 12:00PM
ICT 102 |
This is a general question and answers period where you may ask the Research Computing Services team questions related to RCS and HPC. You may ask both technical and non-technical questions.
|
Research Data Management and Data File Management |
June 11, 1:00PM - 2:20PM
ICT 102 |
Managing your digital files and research materials is critical for keeping yourself organized, collaborating, and communicating with colleagues. In this session, we will cover Research Data Management (RDM) and Data Management Plan (DMP). We will also go over best practices in digital file management depending on your individual and organizational needs.
This presentation will also discuss best practices, versioning, and how to document and share your file and folder convention using a README file. By the end of this session, you should be familiar with RDM and DMP concepts to help keep your research materials organized.
|
Introduction to containers with Apptainer |
June 11, 2:30PM - 3:20PM
ICT 102 |
Reproducible research workflows is essential for repeatability. This session will cover the basics of using containers with Apptainer, a secure container technology designed to be used on for high performance compute clusters. We will cover:
The instructor for this session will be remote and will be streamed in ICT 102. We will provide a zoom link for those who wishes to attend virtually.
|
Managing scientific software with Conda |
June 11, 3:30PM - 4:20PM
ICT 102 |
Running customized scientific software on a shared HPC environment may be challenging. This session, we will go over how to set up customized software environments using Conda.
|
Prefect for Research Workflow Development |
June 12, 2:30PM - 3:50PM
ICT 102 |
Modernize your research workflows using Prefect, an open source workflow orchestration tool. In this session we will cover some of the fundamentals of building workflows with Prefect, with examples on how to deploy Prefect on local and distributed computing infrastructure.
|
AWS: Inspiring the art of the possible |
June 11, 1:30PM - 1:50PM
ICT 102 |
Learn what is possible on AWS Cloud for research.
|
AWS: How AWS works with Researchers |
June 11, 1:30PM - 1:50PM
ICT 102 |
AWS has many programs to support researchers such as credits, letter of supports, immersion days, working on proof of concepts. In this session, we will cover how we engage with researchers and what programs are out there to help accelerate your research with the AWS Cloud.
|
AWS: Machine learning with low-code workshop |
June 11, 1:30 PM - 4:45 PM
ICT 102 |
The Machine Learning (ML) journey requires continuous experimentation and rapid prototyping to be successful. In order to create highly accurate and performant models, data scientists have to first experiment with feature engineering, model selection and optimization techniques. These processes are traditionally time consuming and expensive.
In this workshop attendees will learn the following:
|
Workflow Optimization with NVIDIA GPUs |
June 12, 9:30AM - 12:20AM
ICT 102 |
TBD
|
Dell Presentation: TBD |
June 12, 1:00 PM - 1:50 PM
ICT 102 |
TBD
|