RCS Summer School 2024: Difference between revisions
Ian.percel (talk | contribs) (updated Ian RDM talk title) |
No edit summary |
||
Line 122: | Line 122: | ||
== Sessions == | == Sessions == | ||
{| class="wikitable" | |||
=== Introduction to RCS === | !Session | ||
!Time and Location | |||
!Synopsis | |||
We will begin the summer school with a quick introduction by Jill Kowalchuk, the Interim director of Research Computing Services. We'll go through who RCS is and the services that we offer. | |- | ||
! | |||
=== Introduction to Linux, Bash, and the command line === | ==== Introduction to RCS ==== | ||
|9:00AM - 9:20AM | |||
ICT 102 | |||
A quick crash course on how to use Linux, bash shell, and the command line in general. This beginner friendly session requires no prior experience to Linux. We recommend bringing your own device to follow along. | |We will begin the summer school with a quick introduction by Jill Kowalchuk, the Interim director of Research Computing Services. We'll go through who RCS is and the services that we offer. | ||
'''Speaker:''' Jill Kowalchuk | |||
=== Workshop: Hands on with Linux & Slurm === | '''Level:''' Introductory | ||
'''Prerequisites:''' None | |||
|- | |||
A follow-up workshop that builds on the basics covered in the Linux introduction session and goes into depth on how to use Slurm, the scheduler that RCS uses in their high performance computing clusters. We recommend bringing your own device to follow along. | ! | ||
==== Introduction to Linux, Bash, and the command line ==== | |||
=== Open OnDemand on ARC === | |9:30AM - 10:30AM | ||
ICT 102 | |||
|A quick crash course on how to use Linux, bash shell, and the command line in general. This beginner friendly session requires no prior experience to Linux. We recommend bringing your own device to follow along. | |||
Did you know you can run a Linux desktop on ARC? In this session, we will do a quick demo of ARC Open OnDemand, a web interface that allows users to submit jobs that need graphical user interfaces. We will also cover how to monitor your jobs through Open OnDemand. | '''Speaker:''' Robert Fridman | ||
'''Level:''' Introductory | |||
=== Data in Motion: Navigating Storage Solutions for Active Research Data === | '''Prerequisites:''' None | ||
|- | |||
! | |||
Planning for and requesting specialized storage for large research projects can be a daunting proposition. The variety of storage options and the expected justifications for allocations locally to UCalgary, at national supercomputing sites, and in the public cloud can quickly become overwhelming. This talk aims to provide an introduction to the cost/benefit tradeoff in using different storage systems, when to reach out to different support services around the university for help in making critical decisions, and basic techniques for providing a quantitative justification for a storage request. | ==== Workshop: Hands on with Linux & Slurm ==== | ||
|10:30AM - 11:50 AM | |||
=== Reproducible Data Management with Datalad === | ICT 102 | ||
|A follow-up workshop that builds on the basics covered in the Linux introduction session and goes into depth on how to use Slurm, the scheduler that RCS uses in their high performance computing clusters. We recommend bringing your own device to follow along. | |||
'''Speaker:''' Robert Fridman | |||
'''Level:''' Introductory | |||
'''Prerequisites:''' None | |||
This workshop provides an introduction to digital data management with DataLad. Background content will be covered before conducting the primary hands-on training where attendees will create a small demonstrative research project containing data provenance. | |- | ||
! | |||
==== Open OnDemand on ARC ==== | |||
|12:00 AM - 12:20 AM | |||
ICT 102 | |||
|Did you know you can run a Linux desktop on ARC? In this session, we will do a quick demo of ARC Open OnDemand, a web interface that allows users to submit jobs that need graphical user interfaces. We will also cover how to monitor your jobs through Open OnDemand. | |||
'''Speaker:''' Leo Leung | |||
'''Level:''' Introductory | |||
'''Prerequisites:''' None | |||
|- | |||
! | |||
==== Data in Motion: Navigating Storage Solutions for Active Research Data ==== | |||
|9:30AM - 11:20AM | |||
ICT 114 | |||
Track 2 | |||
|Planning for and requesting specialized storage for large research projects can be a daunting proposition. The variety of storage options and the expected justifications for allocations locally to UCalgary, at national supercomputing sites, and in the public cloud can quickly become overwhelming. This talk aims to provide an introduction to the cost/benefit tradeoff in using different storage systems, when to reach out to different support services around the university for help in making critical decisions, and basic techniques for providing a quantitative justification for a storage request. | |||
'''Speaker:''' Ian Percel | |||
'''Level:''' Introductory | |||
'''Prerequisites:''' None | |||
|- | |||
! | |||
==== Reproducible Data Management with Datalad ==== | |||
|June 10 10:30AM - 11:20AM | |||
June 11 9:30AM - 10:20AM | |||
ICT 114 | |||
|This workshop provides an introduction to digital data management with DataLad. Background content will be covered before conducting the primary hands-on training where attendees will create a small demonstrative research project containing data provenance. | |||
Content to be covered includes: dataset basics, capturing data-provenance, and collaborative data analysis. | Content to be covered includes: dataset basics, capturing data-provenance, and collaborative data analysis. | ||
DataLad is a git-based version control system. Although no git knowledge is required, familiarity with git is strongly advised. Command line experience is required. | DataLad is a git-based version control system. Although no git knowledge is required, familiarity with git is strongly advised. Command line experience is required. | ||
'''Speaker:''' David Deepwell and Pedro Martinez | |||
'''Level:''' Introductory | |||
=== '''Research Data Management and Data File Management''' === | '''Prerequisites:''' Command line experience | ||
|- | |||
! | |||
==== Introduction to HPC resources ==== | |||
|9:30AM - 10:20AM | |||
ICT 102 | |||
|An introduction to high performance computing resources offered by RCS. We will go over how our infrastructure ties in to your research and how to make the most out of Slurm. How to download and transfer data with other institutions. | |||
'''Speaker:''' Robert Fridman, Dave Schulz | |||
'''Level:''' Introductory | |||
'''Prerequisites:''' None | |||
|- | |||
! | |||
==== Linux tools & utilities for working with large data sets ==== | |||
|10:30AM - 11:20AM | |||
ICT 102 | |||
|As researchers use larger and larger datasets, it is imperative to effectively handle and manage these datasets. In this session, we will go through some common methods to work with datasets using standard Linux tools and utilities. We will cover common use cases on how to download large datasets from the Internet, parsing text-based data using tools such as sed, awk, grep, and will then tie everything together with pipes. | |||
'''Speaker:''' Robert Fridman, Dave Schulz | |||
'''Level:''' Introductory | |||
'''Prerequisites:''' Command line experience | |||
|- | |||
! | |||
==== RCS Q&A period: Ask RCS anything ==== | |||
|11:30AM - 12:00PM | |||
ICT 102 | |||
|A general question and answers period where you can ask us anything related to RCS and HPC. | |||
'''Speaker:''' The RCS team | |||
'''Level:''' Introductory | |||
'''Prerequisites:''' None | |||
|- | |||
! | |||
==== '''Research Data Management and Data File Management''' ==== | |||
|1:00PM - 2:20PM | |||
ICT 102 | |||
|Managing your digital files and research materials is critical for keeping yourself organized, collaborating, and communicating with colleagues. In this session, we will cover Research Data Management (RDM) and Data Management Plan (DMP). We will also go over best practices in digital file management depending on your individual and organizational needs. This presentation will also discuss best practices, versioning, and how to document and share your file and folder convention using a README file. | |||
'''Speaker:''' Jennifer Abel, Alex Thistlewood, and Ingrid Reiche (from The University of Calgary Libraries and Cultural Resources) | |||
'''Level:''' Introductory | |||
'''Prerequisites:''' None | |||
|- | |||
! | |||
==== Introduction to containers with Apptainer ==== | |||
|2:30PM - 3:20PM | |||
ICT 102 | |||
|Make your research workflows reproducible through the power of containers. We will go through in detail how to run containers on ARC using Apptainer. | |||
'''Speaker:''' Tannistha Nandi | |||
'''Level:''' Introductory | |||
'''Prerequisites:''' None | |||
|- | |||
! | |||
=== Managing scientific software with Conda === | ==== Managing scientific software with Conda ==== | ||
|3:30PM - 4:20PM | |||
ICT 102 | |||
Running customized scientific software on a shared HPC environment may be challenging. This session, we will go over how to set up customized software environments using Conda. | |Running customized scientific software on a shared HPC environment may be challenging. This session, we will go over how to set up customized software environments using Conda. | ||
'''Speaker:''' Tannistha Nandi | |||
=== Prefect for Research Workflow Development === | '''Level:''' Introductory | ||
'''Prerequisites:''' None | |||
|- | |||
Modernize your research workflows using Prefect, an open source workflow orchestration tool. In this session we will cover some of the fundamentals of building workflows with Prefect, with examples on how to deploy Prefect on local and distributed computing infrastructure. | ! | ||
==== Prefect for Research Workflow Development ==== | |||
|2:30PM - 3:50PM | |||
ICT 102 | |||
|Modernize your research workflows using Prefect, an open source workflow orchestration tool. In this session we will cover some of the fundamentals of building workflows with Prefect, with examples on how to deploy Prefect on local and distributed computing infrastructure. | |||
'''Speaker:''' David Deepwell and Pedro Martinez | |||
'''Level:''' Introductory | |||
'''Prerequisites:''' None | |||
|- | |||
! | |||
==== AWS: Inspiring the art of the possible ==== | ==== AWS: Inspiring the art of the possible ==== | ||
|1:30PM - 1:50PM | |||
ICT 102 | |||
Learn what is possible on AWS Cloud for research. | |Learn what is possible on AWS Cloud for research. | ||
'''Speaker:''' AWS | |||
'''Level:''' Introductory | |||
'''Prerequisites:''' None | |||
|- | |||
! | |||
==== AWS: How AWS works with Researchers ==== | ==== AWS: How AWS works with Researchers ==== | ||
|1:30PM - 1:50PM | |||
ICT 102 | |||
AWS has many programs to support researchers such as credits, letter of supports, immersion days, working on proof of concepts. In this session, we will cover how we engage with researchers and what programs are out there to help accelerate your research with the AWS Cloud. | |AWS has many programs to support researchers such as credits, letter of supports, immersion days, working on proof of concepts. In this session, we will cover how we engage with researchers and what programs are out there to help accelerate your research with the AWS Cloud. | ||
'''Speaker:''' AWS | |||
'''Level:''' Introductory | |||
'''Prerequisites:''' None | |||
|- | |||
! | |||
==== AWS: Machine learning with low-code workshop ==== | ==== AWS: Machine learning with low-code workshop ==== | ||
|1:30 PM - 4:45 PM | |||
ICT 102 | |||
The Machine Learning (ML) journey requires continuous experimentation and rapid prototyping to be successful. In order to create highly accurate and performant models, data scientists have to first experiment with feature engineering, model selection and optimization techniques. These processes are traditionally time consuming and expensive. In this workshop attendees will learn the following: | |The Machine Learning (ML) journey requires continuous experimentation and rapid prototyping to be successful. In order to create highly accurate and performant models, data scientists have to first experiment with feature engineering, model selection and optimization techniques. These processes are traditionally time consuming and expensive. In this workshop attendees will learn the following: | ||
* How the Low-Code ML capabilities found in Amazon SageMaker Data Wrangler, Autopilot and Jumpstart, make it easier to experiment faster and bring highly accurate models to production more quickly and efficiently | * How the Low-Code ML capabilities found in Amazon SageMaker Data Wrangler, Autopilot and Jumpstart, make it easier to experiment faster and bring highly accurate models to production more quickly and efficiently | ||
* How to simplify the process of data preparation and feature engineering, and complete each step of the data preparation workflow | * How to simplify the process of data preparation and feature engineering, and complete each step of the data preparation workflow | ||
Line 215: | Line 282: | ||
* Get started with ML easily and quickly using pre-built solutions for common financial use cases and open source models from popular model zoos. | * Get started with ML easily and quickly using pre-built solutions for common financial use cases and open source models from popular model zoos. | ||
'''Speaker:''' AWS | |||
'''Level:''' Introductory | |||
=== | '''Prerequisites:''' None | ||
|- | |||
==== | ! | ||
==== Workflow Optimization with NVIDIA GPUs ==== | |||
|9:30AM - 12:20AM | |||
ICT 102 | |||
|We will discuss how to optimizing workflows with NVIDIA powered GPUs to help accelerate your research. | |||
'''Speaker:''' Jonathan Dursi from NVIDIA | |||
'''Level:''' Introductory | |||
'''Prerequisites:''' None | |||
|- | |||
! | |||
==== Dell Presentation: TBD ==== | |||
|1:00 PM - 1:50 PM | |||
ICT 102 | |||
|TBD | |||
'''Speaker:''' Rob Lucas from Dell | |||
'''Level:''' Introductory | |||
'''Prerequisites:''' None | |||
|} | |||
__NOTOC__ | __NOTOC__ |
Revision as of 22:17, 16 May 2024
Research Computing Services' 3rd annual summer school will run from Monday, June 10 through to Wednesday, June 12, 2024 from 9AM to 5PM. This summer school consists of various sessions and workshops throughout these 3 days and is completely free to all University of Calgary members.
Our goal for this year's summer school is to Empower our researchers: Inspiring what is possible on HPC infrastructure.
Registration
Registration is required to attend the RCS Summer School sessions. Registration is free to all members of the University of Calgary.
There will be a limit of approximately 100 seats. If you are unable to attend after registering, please cancel/modify your registration or notify us via email.
Topics
- Introduction to RCS services and HPC resources
- Introduction to Linux & Bash command line
- Using Linux utilities for large datasets
- Hands on with Linux & Slurm: Workshop
- Using Open OnDemand on ARC
- Develop a research data management plan
- Reproducible data management with Datalad
- Digital File Management
- Using containers in HPC with Apptainer
- Managing scientific software with Conda
- Research workflow development with Prefect
- AWS: ML in the Cloud, a walkthrough followed by a workshop
- NVIDIA: Workflow optimization using NVIDIA GPUs
- Dell & AMD: Machine learning with Dell and AMD
Schedule
The summer school sessions will be held in ICT 102 and ICT 114. Refreshments will be available in ICT 114 on all 3 days.
Time | June 10 | June 11 | June 12 | |||
---|---|---|---|---|---|---|
Track 1 | Track 2 | Track 1 | Track 2 | Track 1 | Track 2 | |
8:30 AM | Registration & check-in ICT 102 |
Registration & check-in ICT 102 |
Registration & check-in ICT 102 | |||
9:00 AM | Introduction to RCS ICT 102, 9:00 AM - 9:20 AM Jill Kowalchuk |
Refreshments ICT 114 |
The Alliance: Introduction ICT 102 Brock Kahanyshyn |
Refreshments ICT 114 |
TBD
ICT 102 |
Refreshments ICT 114 |
9:30 AM | Introduction to Linux, Bash, and the command line ICT 102, 9:30 AM - 10:30 AM Robert Fridman |
Data in Motion: Navigating Storage Solutions for Active Research Data ICT 114, 9:30 AM - 11:20 AM Ian Percel |
Introduction to HPC resources ICT 102, 9:30 AM - 10:20 AM Robert Fridman, Dave Schulz |
Reproducible Data Management with Datalad: Part II ICT 114, 9:30 AM - 10:20 AM David Deepwell, Pedro Martinez |
NVIDIA: Workflow Optimization with NVIDIA GPUs ICT 102, 9:30 AM - 12:00 PM Jonathan Dursi | |
10:00 AM | Refreshments ICT 114 | |||||
10:30 AM | Workshop: Hands on with Linux & Slurm ICT 102, 10:30 AM - 11:50 AM Robert Fridman |
Linux tools & utilities for working with large data sets ICT 102, 10:30 AM - 11:20 AM Leo Leung, Dave Schulz | ||||
11:00 AM | ||||||
11:30 AM | Reproducible Data Management with Datalad: Part I ICT 114, 11:30 AM - 12:20 AM David Deepwell, Pedro Martinez |
RCS Q&A period: Ask RCS anything ICT 102, 11:30 AM - 12:00 PM RCS Team | ||||
12:00 PM | Open OnDemand on ARC ICT 102, 12:00 AM - 12:20 AM Leo Leung |
Lunch break 12:00 PM - 1:00 PM |
Lunch break 12:00 PM - 1:00 PM | |||
12:30 PM | Lunch break 12:30 PM - 1:30 PM | |||||
1:00 PM | Research Data Management and Data File Management ICT 102, 1:00 PM - 2:20 PM Jennifer Abel, Alex Thistlewood, Ingrid Reiche |
Refreshments ICT 114 |
Dell & AMD: Machine learning with Dell & AMD ICT 102, 1:00 PM - 1:50 PM Rob Lucas | |||
1:30 PM | AWS: Inspiring the art of the possible ICT 102, 1:30 PM - 1:50 PM AWS |
Refreshments ICT 114 | ||||
2:00 PM | AWS: How AWS works with Researchers ICT 102, 2:00 PM - 2:20 PM AWS | |||||
2:30 PM | AWS: Machine Learning with low-code workshop ICT 102, 2:30 PM - 4:50 PM AWS |
Introduction to containers with Apptainer ICT 102, 2:30 PM - 3:20 PM Tannistha Nandi |
Prefect for Research Workflow Development ICT 102, 2:30 PM - 3:50 PM David Deepwell, Pedro Martinez | |||
3:00 PM | ||||||
3:30 PM | Managing scientific software with Conda ICT 102, 3:30 PM - 4:20 PM Dmitri Rozmanov | |||||
4:00 PM | End of day: 4:00 PM | |||||
4:30 PM | End of day: 4:30 PM | |||||
5:00 PM | End of day: 5:00 PM |
Sessions
Session | Time and Location | Synopsis |
---|---|---|
Introduction to RCS |
9:00AM - 9:20AM
ICT 102 |
We will begin the summer school with a quick introduction by Jill Kowalchuk, the Interim director of Research Computing Services. We'll go through who RCS is and the services that we offer.
Speaker: Jill Kowalchuk Level: Introductory Prerequisites: None |
Introduction to Linux, Bash, and the command line |
9:30AM - 10:30AM
ICT 102 |
A quick crash course on how to use Linux, bash shell, and the command line in general. This beginner friendly session requires no prior experience to Linux. We recommend bringing your own device to follow along.
Speaker: Robert Fridman Level: Introductory Prerequisites: None |
Workshop: Hands on with Linux & Slurm |
10:30AM - 11:50 AM
ICT 102 |
A follow-up workshop that builds on the basics covered in the Linux introduction session and goes into depth on how to use Slurm, the scheduler that RCS uses in their high performance computing clusters. We recommend bringing your own device to follow along.
Speaker: Robert Fridman Level: Introductory Prerequisites: None |
Open OnDemand on ARC |
12:00 AM - 12:20 AM
ICT 102 |
Did you know you can run a Linux desktop on ARC? In this session, we will do a quick demo of ARC Open OnDemand, a web interface that allows users to submit jobs that need graphical user interfaces. We will also cover how to monitor your jobs through Open OnDemand.
Speaker: Leo Leung Level: Introductory Prerequisites: None |
|
9:30AM - 11:20AM
ICT 114 Track 2 |
Planning for and requesting specialized storage for large research projects can be a daunting proposition. The variety of storage options and the expected justifications for allocations locally to UCalgary, at national supercomputing sites, and in the public cloud can quickly become overwhelming. This talk aims to provide an introduction to the cost/benefit tradeoff in using different storage systems, when to reach out to different support services around the university for help in making critical decisions, and basic techniques for providing a quantitative justification for a storage request.
Speaker: Ian Percel Level: Introductory Prerequisites: None |
Reproducible Data Management with Datalad |
June 10 10:30AM - 11:20AM
June 11 9:30AM - 10:20AM ICT 114 |
This workshop provides an introduction to digital data management with DataLad. Background content will be covered before conducting the primary hands-on training where attendees will create a small demonstrative research project containing data provenance.
Content to be covered includes: dataset basics, capturing data-provenance, and collaborative data analysis. DataLad is a git-based version control system. Although no git knowledge is required, familiarity with git is strongly advised. Command line experience is required. Speaker: David Deepwell and Pedro Martinez Level: Introductory Prerequisites: Command line experience |
Introduction to HPC resources |
9:30AM - 10:20AM
ICT 102 |
An introduction to high performance computing resources offered by RCS. We will go over how our infrastructure ties in to your research and how to make the most out of Slurm. How to download and transfer data with other institutions.
Speaker: Robert Fridman, Dave Schulz Level: Introductory Prerequisites: None |
Linux tools & utilities for working with large data sets |
10:30AM - 11:20AM
ICT 102 |
As researchers use larger and larger datasets, it is imperative to effectively handle and manage these datasets. In this session, we will go through some common methods to work with datasets using standard Linux tools and utilities. We will cover common use cases on how to download large datasets from the Internet, parsing text-based data using tools such as sed, awk, grep, and will then tie everything together with pipes.
Speaker: Robert Fridman, Dave Schulz Level: Introductory Prerequisites: Command line experience |
RCS Q&A period: Ask RCS anything |
11:30AM - 12:00PM
ICT 102 |
A general question and answers period where you can ask us anything related to RCS and HPC.
Speaker: The RCS team Level: Introductory Prerequisites: None |
Research Data Management and Data File Management |
1:00PM - 2:20PM
ICT 102 |
Managing your digital files and research materials is critical for keeping yourself organized, collaborating, and communicating with colleagues. In this session, we will cover Research Data Management (RDM) and Data Management Plan (DMP). We will also go over best practices in digital file management depending on your individual and organizational needs. This presentation will also discuss best practices, versioning, and how to document and share your file and folder convention using a README file.
Speaker: Jennifer Abel, Alex Thistlewood, and Ingrid Reiche (from The University of Calgary Libraries and Cultural Resources) Level: Introductory Prerequisites: None |
Introduction to containers with Apptainer |
2:30PM - 3:20PM
ICT 102 |
Make your research workflows reproducible through the power of containers. We will go through in detail how to run containers on ARC using Apptainer.
|
Managing scientific software with Conda |
3:30PM - 4:20PM
ICT 102 |
Running customized scientific software on a shared HPC environment may be challenging. This session, we will go over how to set up customized software environments using Conda.
Speaker: Tannistha Nandi Level: Introductory Prerequisites: None |
Prefect for Research Workflow Development |
2:30PM - 3:50PM
ICT 102 |
Modernize your research workflows using Prefect, an open source workflow orchestration tool. In this session we will cover some of the fundamentals of building workflows with Prefect, with examples on how to deploy Prefect on local and distributed computing infrastructure.
Speaker: David Deepwell and Pedro Martinez Level: Introductory Prerequisites: None |
AWS: Inspiring the art of the possible |
1:30PM - 1:50PM
ICT 102 |
Learn what is possible on AWS Cloud for research.
Speaker: AWS Level: Introductory Prerequisites: None |
AWS: How AWS works with Researchers |
1:30PM - 1:50PM
ICT 102 |
AWS has many programs to support researchers such as credits, letter of supports, immersion days, working on proof of concepts. In this session, we will cover how we engage with researchers and what programs are out there to help accelerate your research with the AWS Cloud.
Speaker: AWS Level: Introductory Prerequisites: None |
AWS: Machine learning with low-code workshop |
1:30 PM - 4:45 PM
ICT 102 |
The Machine Learning (ML) journey requires continuous experimentation and rapid prototyping to be successful. In order to create highly accurate and performant models, data scientists have to first experiment with feature engineering, model selection and optimization techniques. These processes are traditionally time consuming and expensive. In this workshop attendees will learn the following:
Level: Introductory Prerequisites: None |
Workflow Optimization with NVIDIA GPUs |
9:30AM - 12:20AM
ICT 102 |
We will discuss how to optimizing workflows with NVIDIA powered GPUs to help accelerate your research.
Speaker: Jonathan Dursi from NVIDIA Level: Introductory Prerequisites: None |
Dell Presentation: TBD |
1:00 PM - 1:50 PM
ICT 102 |
TBD
Speaker: Rob Lucas from Dell Level: Introductory Prerequisites: None |