RCS Summer School 2024: Difference between revisions

From RCSWiki
Jump to navigation Jump to search
 
(81 intermediate revisions by 7 users not shown)
Line 1: Line 1:
Research Computing Services' 3rd annual summer school will run from Monday, June 10 through to Wednesday, June 12, 2024 from 9AM to 5PM. This summer school consists of various sessions and workshops throughout these 3 days and is completely '''''<u>free</u>''''' to all University of Calgary members.
Research Computing Services' 3rd annual summer school offers a handful of courses with a wide range of topics to help empower your research. We will cover topics including Linux/Slurm, ARC/HPC, Research Data Management (RDM) and Data Management Plan (DMP), working with research software and workflows, plus much more. The sessions and workshops is available from introductory to intermediate levels and is suitable for everyone interested in research in HPC.


Our goal for this year's summer school is to '''Empower our researchers:''' Inspiring what is possible on HPC infrastructure.
The summer school will run from Monday, June 10 through to Wednesday, June 12, 2024 from 9AM to 5PM. This 3 day event is completely '''''<u>free</u>''''' to all University of Calgary members.[[File:RCS Summer School 2024 Poster.png|border|center|frameless|850x850px|RCS Summer School 2024 Poster]]
[[File:RCS Summer School 2024 Poster.png|border|center|frameless|850x850px|RCS Summer School 2024 Poster]]


== Registration ==
== Survey ==
Registration is required to attend the RCS Summer School sessions. Registration is free to all members of the University of Calgary.
<span id="survey">‎</span>Please complete our post-event survey to help us improve our future sessions. Each completed survey will enter you into a draw for a $50 gift card. We will contact the winner after the summer school.
<center>
<span class="registerButton">[https://rcs.ucalgary.ca/registration/summer-2024/ Register now]</span>
</center>


There will be a limit of approximately 100 seats. If you are unable to attend after registering, please cancel/modify your registration or notify us via email.
<!--
<span class="registerButton">[https://forms.office.com/Pages/ResponsePage.aspx?id=7KAJxuOlMUaWhhkigL2RUYRT1Hf69QZFtxYPy8rvtBRUN0lNNlg0UzRONVdNQUk0T0VCM1pOUFBENiQlQCN0PWcu Complete Per-Session Survey]</span>
-->


== Topics ==
<span class="registerButton">[https://forms.office.com/Pages/ResponsePage.aspx?id=7KAJxuOlMUaWhhkigL2RUcIYTwiXbi5NnglJOLr1trhUMEtYVFNUT1YwWkpBNEQzQ1RCUkpMVjdFQS4u Complete Post-Event Survey]</span>


* Introduction to RCS services and HPC resources
== Venue ==
* Introduction to Linux & Bash command line
The RCS Summer School 2024 will be held in ICT 102 and ICT 114.
* Using Linux utilities for large datasets
 
* Hands on with Linux & Slurm: Workshop
Prior to attending, please note the following:
* Using Open OnDemand on ARC
 
* Develop a research data management plan
* We will be doing registration and check-in outside ICT 102. Please pick up your name tag here.
* Reproducible data management with Datalad
* '''Please''' '''arrive 5-15 minutes before the start of the morning or afternoon events''' to prevent delays (and to pick up a coffee/snack going into the sessions)
* Digital File Management
* '''Please''' '''bring your own device''' for the follow-along / workshop sessions. We are in a lecture theatre with no lab computers.
* Using containers in HPC with Apptainer
* We will be provisioning access for all attendees to our Teaching and Learning Cluster (TALC) automatically. Instructions will be provided during the session on how to connect.
* Managing scientific software with Conda
 
* Research workflow development with Prefect
=== Directions ===
* AWS: ML in the Cloud, a walkthrough followed by a workshop
Visit the UCalgary campus maps and room finder at: https://www.ucalgary.ca/about/our-campuses/campus-maps-and-room-finder
* NVIDIA: Workflow optimization using NVIDIA GPUs
 
* Dell & AMD: Machine learning with Dell and AMD
The ICT building floor map is available at: https://ucalgary-gs.maps.arcgis.com/sharing/rest/content/items/236901d34b62420a87e99b947eae9e71/data.


== Schedule ==
== Schedule ==
The summer school sessions will be held in ICT 102 and ICT 114. Refreshments will be available in ICT 114 on all 3 days.
The summer school sessions will be held in ICT 102 and ICT 114. Refreshments will be available in ICT 114 on all 3 days (available to anyone registered to the event regardless of the track).
{| class="wikitable"
{| class="wikitable table-left-aligned"
!Time
! rowspan="2" |Time
! colspan="2" |June 10
! colspan="2" |June 10
! colspan="2" |June 11
! colspan="2" |June 11
! colspan="2" |June 12
! colspan="2" |June 12
|-
! width="15%" | Track 1
! width="15%" |Track 2
! width="15%" |Track 1
! width="15%" |Track 2
! width="15%" |Track 1
! width="15%" |Track 2
|-
|-
! width="10%" |8:30 AM
! width="10%" |8:30 AM
| colspan="2" width="15%" |'''Registration & check-in'''<br>ICT 102
| colspan="2" | '''Morning registration & check-in'''<br>ICT 102
| colspan="2" width="15%" |'''Registration & check-in'''<br>ICT 102
| colspan="2" | '''Morning registration & check-in'''<br>ICT 102
| colspan="2" width="15%" |'''Registration & check-in'''<br>ICT 102
| colspan="2" | '''Morning registration & check-in'''<br>ICT 102
|-
|-
!9:00 AM
!9:00 AM
|'''[[RCS Summer School 2024#Introduction to RCS|Introduction to RCS]]'''<br>ICT 102, 9:00 AM - 9:20 AM<br>Jill Kowalchuk
| colspan="2" |'''[[RCS Summer School 2024#Introduction to the Summer School & RCS|Introduction to the Summer School & RCS]]'''<br>ICT 102, 9:00 AM - 9:20 AM<br>Jill Kowalchuk
|Refreshments <br>ICT 114
| colspan="2" |'''The Alliance: An Introduction<br>'''ICT 102, 9:00 AM - 9:20 AM<br>Brock Kahanyshyn
|'''The Alliance: Introduction'''<br>ICT 102<br>Brock Kahanyshyn
| rowspan="2" |'''[[RCS Summer School 2024#Introduction to containers with Apptainer|Introduction to containers with Apptainer]]'''<br>ICT 102, 9:00 AM - 9:50 AM<br>Tannistha Nandi
|Refreshments<br>ICT 114
| rowspan="2" |'''[[RCS Summer School 2024#Fast Dataframes with Polars on Python|Fast Dataframes with Polars on Python]]<br>'''ICT 114, 9:00 AM - 9:50 AM<br>Dave Schulz
|'''TBD'''
ICT 102
| rowspan="14" |Refreshments<br>ICT 114
|-
|-
!9:30 AM
!9:30 AM
| rowspan="2" |'''[[RCS Summer School 2024#Introduction to Linux, Bash,and the command line|Introduction to Linux, Bash,<br>and the command line]]<br>'''ICT 102, 9:30 AM - 10:30 AM<br>Robert Fridman
| rowspan="2" |'''[[RCS Summer School 2024#Introduction to Linux, Bash, and the command line|Introduction to Linux, Bash, and the command line]]<br>'''ICT 102, 9:30 AM - 10:30 AM<br>Robert Fridman
| rowspan="4" |'''[[RCS Summer School 2024#Developing a Research Data Management Plan with technical storage requirements|Developing a Research Data Management Plan with technical storage requirements]]<br>'''ICT 114, 9:30 AM - 11:20 AM<br>Ian Percel
| rowspan="5" |'''[[RCS Summer School 2024#NVIDIA|Accelerate data science workflows with NVIDIA RAPIDS]]''' <br>ICT 114, 9:30 AM - 11:50 AM<br>Tarini Bhatnagar
| rowspan="2" |'''[[RCS Summer School 2024#Introduction to HPC resources|Introduction to HPC resources]]'''<br>ICT 102, 9:30 AM - 10:20 AM<br>Robert Fridman, Dave Schulz
| rowspan="2" |'''[[RCS Summer School 2024#Introduction to HPC resources|Introduction to HPC resources]]'''<br>ICT 102, 9:30 AM - 10:20 AM<br>Robert Fridman, Dave Schulz
|'''[[RCS Summer School 2024#Reproducible Data Management with Datalad|Reproducible Data Management with Datalad: Part II]]<br>'''ICT 114, 9:30 AM - 10:20 AM<br>David Deepwell, Pedro Martinez
| rowspan="3" |'''[[RCS Summer School 2024#Prefect for Research Workflow Development|Prefect for Research Workflow Development]]'''<br>ICT 114, 9:30 AM - 11:00 AM<br>Pedro Martinez
| rowspan="5" |'''[[RCS Summer School 2024#NVIDIA|NVIDIA: Workflow Optimization with NVIDIA GPUs]]'''<br>ICT 102, 9:30 AM - 12:00 PM<br>Jonathan Dursi
|-
|-
!10:00 AM
!10:00 AM
| rowspan="4" |Refreshments<br>ICT 114
| rowspan="2" |'''[[RCS Summer School 2024#Managing scientific software with Conda|Managing scientific software with Conda]]'''<br>ICT 102, 10:00 AM - 10:50 AM<br>Dmitri Rozmanov
| rowspan="4" |'''Track 2 ends'''<br>Refreshments available in <br>ICT 114
|-
|-
!10:30 AM
!10:30 AM
| rowspan="3" |'''[[RCS Summer School 2024#Workshop: Hands on with Linux & Slurm|Workshop: Hands on with Linux & Slurm]]'''<br>ICT 102, 10:30 AM - 11:50 AM<br>Robert Fridman
| rowspan="3" |'''[[RCS Summer School 2024#Workshop: Hands on with Linux & Slurm|Hands on with Linux & Slurm]]'''<br>ICT 102, 10:30 AM - 11:50 AM<br>Robert Fridman
| rowspan="2" |'''[[RCS Summer School 2024#Linux tools & utilities for working with large data sets|Linux tools & utilities for working with large data sets]]'''<br>ICT 102, 10:30 AM - 11:20 AM<br>Leo Leung, Dave Schulz
| rowspan="2" |'''[[RCS Summer School 2024#Linux tools & utilities for working with large data sets|Linux tools & utilities for working with large data sets]]'''<br>ICT 102, 10:30 AM - 11:20 AM<br>Leo Leung, Dave Schulz  
|-
|-
!11:00 AM
!11:00 AM
|'''Track 2 ends'''<br>Refreshments available in<br>ICT 114
| rowspan="2" |'''[[RCS Summer School 2024#Reproducible Data Management with Datalad|Reproducible Data Management with Datalad: Part I]]<br>'''ICT 102, 11:00 AM - 11:50 AM<br>David Deepwell
|-
|-
!11:30 AM
!11:30 AM
| rowspan="2" |'''[[RCS Summer School 2024#Reproducible Data Management with Datalad|Reproducible Data Management with Datalad: Part I]]<br>'''ICT 114, 11:30 AM - 12:20 AM<br>David Deepwell, Pedro Martinez
| colspan="2" |'''[[RCS Summer School 2024#RCS Q&A period: Ask RCS anything|RCS Q&A period: Ask RCS anything]]'''<br>ICT 102, 11:30 AM - 12:00 PM<br>RCS Team
|'''[[RCS Summer School 2024#RCS Q&A period: Ask RCS anything|RCS Q&A period: Ask RCS anything]]'''<br>ICT 102, 11:30 AM - 12:00 PM<br>RCS Team
|-
|-
!12:00 PM
!12:00 PM
|'''[[RCS Summer School 2024#Open OnDemand on ARC|Open OnDemand on ARC]]'''<br>ICT 102, 12:00 AM - 12:20 AM<br>Leo Leung
| colspan="2" rowspan="2" |'''Lunch break, Afternoon registration & check-in'''<br>12:00 PM - 1:00 PM
| colspan="2" rowspan="2" |'''Lunch break'''<br>12:00 PM - 1:00 PM
| colspan="2" rowspan="2" |'''Lunch break, Afternoon registration & check-in'''<br>12:00 PM - 1:00 PM
| rowspan="2" |'''Lunch break'''<br>12:00 PM - 1:00 PM
| colspan="2" rowspan="2" |'''Lunch break, Afternoon registration & check-in'''<br>12:00 PM - 1:00 PM
|-
|-
!12:30 PM
! 12:30 PM
| colspan="2" rowspan="2" |'''Lunch break'''<br>12:30 PM - 1:30 PM
|-
|-
!1:00 PM
!1:00 PM
| rowspan="3" |'''[[RCS Summer School 2024#Research Data Management and Data File Management|Research Data Management and Data File Management]]'''<br>ICT 102, 1:00 PM - 2:20 PM<br>Jennifer Abel, Alex Thistlewood, Ingrid Reiche
|'''[[RCS Summer School 2024#Open OnDemand on ARC|Open OnDemand on ARC]]'''<br>ICT 102, 1:00 PM - 1:20 PM<br>Leo Leung
| rowspan="7" |Refreshments<br>ICT 114
| rowspan="8" |'''No Track 2'''<br>Refreshments Available in<br>ICT 114
| rowspan="3" |'''[[RCS Summer School 2024#Dell & AMD|Dell & AMD: Machine learning with Dell & AMD]]'''<br>ICT 102, 1:00 PM - 1:50 PM
| rowspan="3" |'''[[RCS Summer School 2024#Research Data Management and Data File Management|Research Data Management and Data File Management]]'''<br>ICT 102, 1:00 PM - 2:20 PM<br>Ingrid Reiche, Jennifer Abel, Alex Thistlewood
Rob Lucas
| rowspan="7" |'''No Track 2'''<br>Refreshments Available in<br>ICT 114
| rowspan="2" |'''[[RCS Summer School 2024#Dell|Dell & AMD: Machine learning with Dell & AMD]]'''<br>ICT 102, 1:00 PM - 2:30 PM
Rob Parish
| rowspan="5" |'''No Track 2'''<br>Refreshments Available in<br>ICT 114
|-
|-
!1:30 PM
!1:30 PM
|'''[[RCS Summer School 2024#AWS|AWS: Inspiring the art of the possible]]'''<br>ICT 102, 1:30 PM - 1:50 PM
|'''[[RCS Summer School 2024#AWS|AWS: Inspiring the art of the possible]]'''<br>ICT 102, 1:30 PM - 2:00 PM<br>Patrick Colucci
AWS
| rowspan="7" |Refreshments<br>ICT 114
|-
|-
!2:00 PM
!2:00 PM
|[[RCS Summer School 2024#AWS|'''AWS: How AWS works with Researchers''']]<br>ICT 102, 2:00 PM - 2:20 PM
|[[RCS Summer School 2024#AWS|'''AWS: How AWS works with Researchers''']]<br>ICT 102, 2:00 PM - 2:30 PM<br>Jessica Steed & Hatem Siyala
AWS
| rowspan="3" |'''[[RCS Summer School 2024#Reproducible Data Management with Datalad|Reproducible Data Management with Datalad: Part II]]<br>'''ICT 102, 2:30 PM - 3:20 PM<br>David Deepwell
|-
|-
!2:30 PM
!2:30 PM
| rowspan="5" |'''[[RCS Summer School 2024#AWS|AWS: Machine Learning with low-code workshop]]'''<br>ICT 102, 2:30 PM - 4:50 PM<br>AWS
| rowspan="5" |'''[[RCS Summer School 2024#AWS|AWS: Machine Learning with low-code workshop]]'''<br>ICT 102, 2:30 PM - 5:00 PM<br>Abhi Sodhani
| rowspan="2" |'''[[RCS Summer School 2024#Introduction to containers with Apptainer|Introduction to containers with Apptainer]]'''<br>ICT 102, 2:30 PM - 3:20 PM<br>Tannistha Nandi
| rowspan="4" |'''[[RCS Summer School 2024#Data in Motion: Navigating Storage Solutions for Active Research Data|Data in Motion: Navigating Storage Solutions for Active Research Data]]<br>'''ICT 102, 2:30 AM - 4:20 AM<br>Ian Percel, Jennifer Abel, Alex Thistlewood
| rowspan="3" |'''[[RCS Summer School 2024#Prefect for Research Workflow Development|Prefect for Research Workflow Development]]'''<br>ICT 102, 2:30 PM - 3:50 PM<br>David Deepwell, Pedro Martinez
|-
|-
!3:00 PM
!3:00 PM
|-
|-
!3:30 PM
!3:30 PM
| rowspan="2" |'''[[RCS Summer School 2024#Managing scientific software with Conda|Managing scientific software with Conda]]'''<br>ICT 102, 3:30 PM - 4:20 PM<br>Dmitri Rozmanov
| colspan="2" rowspan="4" |'''End of day: 3:30 PM'''
|-
|-
!4:00 PM
!4:00 PM
| colspan="2" rowspan="3" |'''End of day: 4:00 PM'''
|-
|-
!4:30 PM
!4:30 PM  
| colspan="2" rowspan="2" |'''End of day: 4:30 PM'''
| colspan="2" rowspan="2" |'''End of day: 4:30 PM'''
|-
|-
!5:00 PM
!5:00 PM  
| colspan="2" |'''End of day: 5:00 PM'''
| colspan="2" |'''End of day: 5:00 PM'''
|}
|}


== Sessions ==
==Sessions ==
{| class="wikitable table-left-aligned"
! width="20%" |Session
! width="20%" |Time and Location
! width="60%" |Synopsis
|-
!
====Introduction to the Summer School & RCS====
|June 10, 9:00AM - 9:20AM
ICT 102
|We will begin the RCS summer school with a quick introduction by Jill Kowalchuk, the Interim director of Research Computing Services. We will introduce the RCS team, provide a high level overview of our services, and how to get help and support from our analysts.


=== Introduction to RCS ===
*'''Speaker:''' Jill Kowalchuk
ICT 102, 9:00AM - 9:20AM by Jill Kowalchuk
*'''Format:''' Lecture
*'''Level:''' Introductory
*'''Prerequisites:''' None
|-
!
====Introduction to Linux, Bash, and the command line====
|June 10, 9:30AM - 10:30AM
ICT 102
|This course provides you with essential skills to effectively use the Linux command line. We will go over from ground up how to log-in and interact with our HPC cluster, traverse the filesystem, execute programs, and manage files.
This beginner friendly session requires no prior experience to Linux. We recommend bringing your own device to follow along. By the end of the course, you should be familiar with what is possible with the Linux command line.


We will begin the summer school with a quick introduction by Jill Kowalchuk, the Interim director of Research Computing Services. We'll go through who RCS is and the services that we offer.
*'''Speaker:''' Robert Fridman
*'''Format:''' Lecture + Follow along
*'''Level:''' Introductory
*'''Prerequisites:''' None
|-
!
====Workshop: Hands on with Linux & Slurm====
|June 10, 10:30AM - 11:50 AM
ICT 102
|This follow-up workshop comes immediately after the Introduction to Linux session. We will build on what we learned in the previous session and go into details on how to use the HPC cluster using the Slurm scheduler.
This workshop will provide you with the skills necessary to write a simple Slurm batch script, submit jobs to Slurm, view and manage your jobs. By the end of the course, you will be familiar with what Slurm is, how it fits in in a HPC environment, and how to start using Slurm on our HPC clusters for your research.


=== Introduction to Linux, Bash, and the command line ===
This is a beginner friendly workshop. You should be familiar with the Linux command line. We recommend bringing your own device to follow along.
ICT 102, 9:30AM - 10:30AM by Robert Fridman


A quick crash course on how to use Linux, bash shell, and the command line in general. This beginner friendly session requires no prior experience to Linux. We recommend bringing your own device to follow along.
*'''Speaker:''' Robert Fridman
*'''Format:''' Workshop + Hands on
*'''Level:''' Introductory
*'''Prerequisites:''' None
|-
!
====Open OnDemand on ARC====
|June 10, 1:00 PM - 1:20 PM
ICT 102
| Did you know you can run a Linux desktop and graphical tools on ARC? This session will cover what ARC Open OnDemand is and how it may help with your research. We will show you how to:


=== Workshop: Hands on with Linux & Slurm ===
*Connect to Open OnDemand through your browser
ICT 102, 10:30AM - 11:50 AM by Robert Fridman
*Start a graphical desktop environment in our ARC HPC cluster environment
*View and mange files in your home directory via Open OnDemand
*Connect to ARC through your web browser
*View the status of your submitted jobs


A follow-up workshop that builds on the basics covered in the Linux introduction session and goes into depth on how to use Slurm, the scheduler that RCS uses in their high performance computing clusters. We recommend bringing your own device to follow along.
By the end of this session, you will be familiar with the options available on Open OnDemand and be able to start graphical sessions through this service. This is a beginner friendly workshop and no prior experience is necessary. We recommend bringing your own device to follow along.
*'''Speaker:''' Leo Leung
*'''Format:''' Lecture + Follow along
*'''Level:''' Introductory
*'''Prerequisites:''' None
|-
!
====Data in Motion: Navigating Storage Solutions for Active Research Data====
|June 12, 2:30 AM - 4:20 AM
ICT 102
|Planning for and requesting specialized storage for large research projects can be a daunting proposition. The variety of storage options and the expected justifications for allocations locally to UCalgary, at national supercomputing sites, and in the public cloud can quickly become overwhelming. This talk aims to provide an introduction to the cost/benefit tradeoff in using different storage systems, when to reach out to different support services around the university for help in making critical decisions, and basic techniques for providing a quantitative justification for a storage request.
By the end of the session, you will be familiar with the types of storage related questions that should be answered when tackling large research projects and the different types of solutions that the University offers our researchers.


=== Open OnDemand on ARC ===
*'''Speaker:''' Ian Percel, Jennifer Abel, Alex Thistlewood
ICT 102, 12:00 AM - 12:20 AM by Leo Leung
*'''Format:''' Lecture
*'''Level:''' Introductory
*'''Prerequisites:''' None
|-
!
====Reproducible Data Management with Datalad====
|'''Part I:''' June 12, 11:00 AM - 11:50 AM
'''Part II:''' June 12, 2:30 PM - 3:20 PM
ICT 102
|Data reproducibility comes from a knowledge of data provenance. This two part workshop introduces DataLad, a digital data management system based on Git which records research provenance. The sessions will demonstrate the creation of a small research project containing data provenance.
Content to be covered in the two-part session includes:


Did you know you can run a Linux desktop on ARC? In this session, we will do a quick demo of ARC Open OnDemand, a web interface that allows users to submit jobs that need graphical user interfaces. We will also cover how to monitor your jobs through Open OnDemand.
*Dataset basics
*Capturing data-provenance
*Re-executing analyses
*Collaborative data analysis


=== Developing a Research Data Management Plan with technical storage requirements ===
ICT 114, 9:30AM - 11:20AM by Ian Percel


Effective management of your research data is paramount. Join us as we delve into crafting robust data management plans tailored to your specific research needs.
*'''Speaker:''' David Deepwell
*'''Format:''' Lecture + demo
*'''Level:''' Introductory
*'''Prerequisites:''' Command line experience, familiarity with Git is advised
|-
!


=== Reproducible Data Management with Datalad ===
====Introduction to HPC resources====
Part I: ICT 114, 10:30AM - 11:20AM by David Deepwell and Pedro Martinez
|June 11, 9:30AM - 10:20AM
ICT 102
|This session is a primer for those new to high performance computing (HPC) or computing on remote resources. We will build on the foundations built from our previous Linux and Slurm introductory sessions and expand on the larger picture, including:


Part II: ICT 114, 9:30AM - 10:20AM by David Deepwell and Pedro Martinez
*Motivation for using HPC
*Finding available resources on HPC resources
*Issues and pitfalls to avoid (such as incorrect job resource requests)
*Troubleshooting job failures
*High level overview of parallel programming with Slurm
*How to transfer data to/from other institutionsThis is a beginner friendly workshop. You should be familiar with the Linux command line. We recommend bringing your own device to follow along.


This workshop provides an introduction to digital data management with DataLad. Background content will be covered before conducting the primary hands-on training where attendees will create a small demonstrative research project containing data provenance.
*'''Speaker:''' Robert Fridman, Dave Schulz
*'''Format:''' Lecture
*'''Level:''' Introductory
*'''Prerequisites:''' Linux command line, Slurm
|-
!
====Linux tools & utilities for working with large data sets====
|June 11, 10:30AM - 11:20AM
ICT 102
|This session introduces more intermediate to advanced uses of the Linux environment for handling large data sets. The course will demonstrate the power of shell pipes and how you can work with large datasets with just standard Linux tools and utilities that is built-in to the system.
We will cover some common use cases including:


Content to be covered includes: dataset basics, capturing data-provenance, and collaborative data analysis.
*How to download large datasets from the Internet
*How to parsing text-based data using tools such as sed, awk, grep
*How to build powerful text mining, conversion, and visualization with just the command line


DataLad is a git-based version control system. Although no git knowledge is required, familiarity with git is strongly advised. Command line experience is required.
This is an intermediate course. You should be familiar with the Linux command line and some common Linux utilities prior to the course. Some understanding of regular expressions may be useful.
*'''Speaker:''' Leo Leung, Dave Schulz
*'''Format:''' Lecture
*'''Level:''' Introductory to Intermediate
*'''Prerequisites:''' Command line experience
|-
!
====RCS Q&A period: Ask RCS anything====
|June 11, 11:30AM - 12:00PM
ICT 102
|This is a general question and answers period where you may ask the Research Computing Services team questions related to RCS and HPC. You may ask both technical and non-technical questions.


=== Introduction to HPC resources ===
*'''Speaker:''' The RCS team
ICT 102, 9:30AM - 10:20AM by Robert Fridman, Dave Schulz
*'''Format:''' Question & Answer period
*'''Level:''' Introductory
*'''Prerequisites:''' None
|-
!
===='''Research Data Management and Data File Management'''====
|June 11, 1:00PM - 2:20PM
ICT 102
|Managing your digital files and research materials is critical for keeping yourself organized, collaborating, and communicating with colleagues. In this session, we will cover Research Data Management (RDM) and Data Management Plan (DMP). We will also go over best practices in digital file management depending on your individual and organizational needs.
This presentation will also discuss best practices, versioning, and how to document and share your file and folder convention using a README file.
By the end of this session, you should be familiar with RDM and DMP concepts to help keep your research materials organized.


An introduction to high performance computing resources offered by RCS. We will go over how our infrastructure ties in to your research and how to make the most out of Slurm. How to download and transfer data with other institutions.
*'''Speaker:''' Ingrid Reiche, Jennifer Abel, Alex Thistlewood (from The University of Calgary Libraries and Cultural Resources)
*'''Format:''' Lecture
*'''Level:''' Introductory
*'''Prerequisites:''' None
|-
!
====Introduction to containers with Apptainer====
|June 12, 9:00AM - 9:50PM
ICT 102
|Reproducible research workflows is essential for repeatability. This session will cover the basics of using containers with Apptainer, a secure container technology designed to be used on for high performance compute clusters. We will cover:


=== Linux tools & utilities for working with large data sets ===
*When do you want to use a container?
ICT 102, 10:30AM - 11:20AM by Leo Leung
*How to get a container image?
*How to run a container image on the cluster?


As researchers use larger and larger datasets, it is imperative to effectively handle and manage these datasets. In this session, we will go through some common methods to work with datasets using standard Linux tools and utilities. We will cover common use cases on how to download large datasets from the Internet, parsing text-based data using tools such as sed, awk, grep, and will then tie everything together with pipes.
The instructor for this session will be remote and will be streamed in ICT 102. We will provide a zoom link for those who wishes to attend virtually.
*'''Speaker:''' Tannistha Nandi
*'''Format:''' Lecture
*'''Level:''' Introductory
*'''Prerequisites:''' None
|-
!


=== RCS Q&A period: Ask RCS anything ===
====Managing scientific software with Conda====
ICT 102, 11:30AM - 12:00PM by the RCS team
|June 12, 10:00 AM - 10:50 AM
ICT 102
|Running customized scientific software on a shared HPC environment may be challenging. This session, we will go over how to set up customized software environments using Conda.


A general question and answers period where you can ask us anything related to RCS and HPC.
*'''Speaker:''' Dmitri Rozmanov
*'''Level:''' Introductory
*'''Prerequisites:''' None
|-
!
====Prefect for Research Workflow Development====
|June 11, 9:30 AM - 11:00 AM
ICT 114
|Modernize your research workflows using Prefect, an open source workflow orchestration tool. In this session we will cover some of the fundamentals of building workflows with Prefect, with examples on how to deploy Prefect on local and distributed computing infrastructure.


=== '''Research Data Management and Data File Management''' ===
*'''Speaker:''' Pedro Martinez
ICT 102, 1:00PM - 2:20PM by Jennifer Abel, Alex Thistlewood, and Ingrid Reiche (from The University of Calgary Libraries and Cultural Resources)
*'''Format:''' Lecture + Follow Along
*'''Level:''' Introductory
*'''Prerequisites:''' Python, Slurm
|-
!


Managing your digital files and research materials is critical for keeping yourself organized, collaborating, and communicating with colleagues. In this session, we will cover Research Data Management (RDM) and Data Management Plan (DMP). We will also go over best practices in digital file management depending on your individual and organizational needs. This presentation will also discuss best practices, versioning, and how to document and share your file and folder convention using a README file.
====AWS: Inspiring the art of the possible====
‎<span id="AWS">‎</span>
|June 11, 1:30 PM - 2:00 PM
ICT 102
|Learn what is possible on AWS Cloud for research.


=== Introduction to containers with Apptainer ===
*'''Speaker:''' Patrick Colucci
ICT 102, 2:30PM - 3:20PM by Tannistha Nandi
*'''Format:''' Lecture
*'''Level:''' Introductory
*'''Prerequisites:''' None
|-
!
====AWS: How AWS works with Researchers====
|June 11, 2:00 PM - 2:30 PM
ICT 102
| AWS has many programs to support researchers such as credits, letter of supports, immersion days, working on proof of concepts. In this session, we will cover how we engage with researchers and what programs are out there to help accelerate your research with the AWS Cloud.


Make your research workflows reproducible through the power of containers. We will go through in detail how to run containers on ARC using Apptainer.
*'''Speaker:''' Jessica Steed & Hatem Siyala
*'''Format:''' Lecture
*'''Level:''' Introductory
*'''Prerequisites:''' None
|-
!
====AWS: Machine learning with low-code workshop====
|June 11, 2:30 PM - 5:00 PM
ICT 102
| The Machine Learning (ML) journey requires continuous experimentation and rapid prototyping to be successful. In order to create highly accurate and performant models, data scientists have to first experiment with feature engineering, model selection and  optimization techniques. These processes are traditionally time consuming and expensive.
In this workshop attendees will learn the following:
*How the Low-Code ML capabilities found in Amazon SageMaker Data Wrangler, Autopilot and Jumpstart, make it easier to experiment faster and bring highly accurate models to production more quickly and efficiently
*How to simplify the process of data preparation and feature engineering, and complete each step of the data preparation workflow
*Understand how to automatically build, train, and tune the best machine learning models based on your data, while allowing you to maintain full control and visibility.
*Get started with ML easily and quickly using pre-built solutions for common financial use cases and open source models from popular model zoos.


=== Managing scientific software with Conda ===
*'''Speaker:''' Abhi Sodhani
ICT 102, 3:30PM - 4:20PM by Dmitri Rozmanov
*'''Format:''' Workshop + Hands on
*'''Level:''' Introductory
*'''Prerequisites:''' None
|-
!
===='''Accelerate data science workflows with NVIDIA RAPIDS'''====
‎<span id="NVIDIA">‎</span>
|June 10, 9:30 AM - 11:50 AM
ICT 114
|Unlock the power of GPU acceleration for your data science projects in our hands-on workshop. This session is designed to introduce participants to NVIDIA RAPIDS, a suite of open-source software libraries and APIs built on CUDA. RAPIDS enables data scientists and analysts to execute end-to-end data science and analytics pipelines entirely on GPUs, significantly speeding up workflows.
In this interactive session, we will:


Running customized scientific software on a shared HPC environment may be challenging. This session, we will go over how to set up customized software environments using Conda.
*Introduce NVIDIA RAPIDS and its possibilities for data scientists
*Run RAPIDS in a Jupyter notebook environment on ARC
*With with sample datasets to perform data manipulation and visualization tasks
*Explore hands-on coding exercises that illustrate the advantages of GPU accelerated processing


=== Prefect for Research Workflow Development ===
By the end of this session, you will have the basic practical skills necessary to start using RAPIDS for GPU-accelerated research work on our HPC infrastructure.
ICT 102, 2:30PM - 3:50PM by David Deepwell and Pedro Martinez
*'''Speaker:''' Tarini Bhatnagar from NVIDIA
*'''Format:''' Lecture + Follow Along
*'''Level:''' Introductory
*'''Prerequisites:''' Introductory Python and Pandas recommended
|-
!
====Dell Presentation: TBD====
‎<span id="Dell">‎</span>
|June 12, 1:00 PM - 1:50 PM
ICT 102
|TBD


Modernize your research workflows using Prefect, an open source workflow orchestration tool.  We will show how you can build and deploy resilient workflows.
*'''Speaker:''' Rob Lucas from Dell
*'''Format:''' Lecture
*'''Level:''' Introductory
*'''Prerequisites:''' None
|-
!
==== Fast Dataframes with Polars on Python ====
|June 12, 9:00 AM - 9:50 AM
ICT 114
|Have you used Pandas Dataframes with Python and are unhappy with the performance? Can't seem to figure out Pandas' indexing? In this session, we will go over using fast dataframes with Polars.
Polars dataframe features:


=== AWS ===
* Automatically use all available CPUs for most operations
==== AWS: Inspiring the art of the possible ====
* Can be accessed via SQL queries
ICT 102, 1:30PM - 1:50PM by AWS
* Supports lazy execution


Learn what is possible on AWS Cloud for research.
By the end of this session, you'll have the basic skills to leverage Polars in your Python workflows.


==== AWS: How AWS works with Researchers ====
* '''Speaker:''' Dave Schulz
ICT 102, 1:30PM - 1:50PM by AWS
* '''Format:''' Lecture + Hands on
* '''Level:''' Introductory
* '''Prerequisites:''' Python
|}


AWS has many programs to support researchers such as credits, letter of supports, immersion days, working on proof of concepts. In this session, we will cover how we engage with researchers and what programs are out there to help accelerate your research with the AWS Cloud.


==== AWS: Machine learning with low-code workshop ====
== Materials ==
ICT 102, 1:30 PM - 4:45 PM by AWS
‎<span id="materials">‎</span>Check back soon. Session materials will be posted here after the event.
 
{| class="wikitable"
The Machine Learning (ML) journey requires continuous experimentation and rapid prototyping to be successful. In order to create highly accurate and performant models, data scientists have to first experiment with feature engineering, model selection and  optimization techniques. These processes are traditionally time consuming and expensive. In this workshop attendees will learn the following:
!Session
 
!Course materials
* How the Low-Code ML capabilities found in Amazon SageMaker Data Wrangler, Autopilot and Jumpstart, make it easier to experiment faster and bring highly accurate models to production more quickly and efficiently
|-
* How to simplify the process of data preparation and feature engineering, and complete each step of the data preparation workflow
|Linux tools & utilities for working with large data sets
* Understand how to automatically build, train, and tune the best machine learning models based on your data, while allowing you to maintain full control and visibility.
|
* Get started with ML easily and quickly using pre-built solutions for common financial use cases and open source models from popular model zoos.
* [[Media:SummerSchool2024-linux-pipes-intro.pdf|Download PowerPoint Slide deck,]]
 
* Demo dataset stored at: <code>/global/software/SummerSchool2024/enron</code>
=== NVIDIA ===
|-
 
|Amazon Web Services
==== Workflow Optimization with NVIDIA GPUs ====
|
ICT 102, 9:30AM - 12:20AM by NVIDIA
* [[:File:SummerSchool2024-aws-research.pdf|Download PowerPoint Slide deck - AWS for Research]]
 
* [[:File:SummerSchool2024-aws-bedrock.pdf|Download PowerPoint Slide deck - AWS Bedrock]]
We will discuss how to optimizing workflows with NVIDIA powered GPUs to help accelerate your research.
* [https://aws.amazon.com/executive-insights/innovation/?executive-insights-cards.sort-by=item.additionalFields.sortDate&executive-insights-cards.sort-order=desc&awsf.filter-content-type=*all#Fostering_a_culture_of_innovation AWS Culture for Innovation]
 
|-
=== Dell & AMD ===
|Prefect for Research Workflow Development
 
|[[Media:Summer-school-prefect-20240531.pdf|Download PowerPoint Slide deck]]
==== Machine learning with Dell & AMD ====
https://github.com/peterg1t/prefect-summer-school
ICT 102, 1:00PM - 1:50PM by Rob Lucas
|-
|Open OnDemand
|[[Media:SummerSchool2024-open-ondemand.pdf|Download PowerPoint Slide deck]]
|-
|The Alliance presentation
|[[Media:SummerSchool2024-alliance-presentation.pdf|Download PowerPoint Slide deck]]
|-
|Introduction to HPC Containers with Apptainer
|[[Media:SummerSchool2024-apptainer-containers.pdf|Download PowerPoint Slide deck]]
|-
|Polars Dataframes
|[[Media:PolarsSession.ipynb.zip|Download Polars Jupyter Notebook]]
Input Data is available On the Arc cluster under /global/software/SummerSchool2024/Polars/data
|-
|NVIDIA RAPIDS
|[[Media:SummerSchool2024-nvidia-presentation.pdf|Download PowerPoint Slide deck]]
|-
|Library File Management
|[[Media:SummerSchool2024-library-filemanagement.pdf|Download PowerPoint Slide deck]]
|-
|Managing Scientific Software with Conda
|[[Media:SummerSchool2024-conda.pdf|Download PowerPoint Slide deck]]
|-
|Prefect for Research Workflow Development
|[[Media:Summer-school-prefect-20240531.pdf|Download PowerPoint Slide deck]]
|-
|Introduction to the Linux Command Line
|[[Media:SummerSchool2024-Introduction to the Linux Command Line 2024 V2.pdf|Download PowerPoint Slide deck]]
|-
|Introduction to HPC Resources
|[[Media:SummerSchool2024-Introduction to HPC resources 2024.pdf|Download PowerPoint Slide deck]]
|-
|Introduction to DataLad
|[[Media:SummerSchool2024-DataLad.pdf|Download PowerPoint Slide deck]] [[Media:SummerSchool2024-DataLad-Scripts.tgz|Download Example Scripts]]
|}


To be announced.
== Frequently Asked Questions ==


__NOTOC__
; Can I attend this event remotely?
: We are only offering these sessions in-person on the specified dates in ICT 102 and ICT 114.
; Will there be recordings made available afterwards?
: The sessions will not be recorded.
; Can I forward this invitation to others?
: You are welcome to forward this invite to any faculty, staff, and students at the University of Calgary.
; I am only interested in one of the sessions in the morning/afternoon period. Can I drop in at anytime to these sessions?
: You may show up to only the session that interests you, but we ask that you register for the entire morning/afternoon session that covers the session you are interested in.
; Will there be any free food?
: We will be providing snacks and refreshments during this event. There will be gluten-free and vegetarian options. The food will be placed in ICT 114 and is available to anyone registered to the event.
; What happens if there is no more seats available?
: We will be offering up to 100 seats for this session. We may be able to raise this depending on the interest in the sessions. If you are interested in joining a session that has filled, please join the waiting list when ordering the ticket and reach out to us for options.
; I have more questions. Who do I contact?
: Please reach out to the Research Computing Services team at support [at] hpc.ucalgary.ca.__NOTOC__

Latest revision as of 18:37, 12 July 2024

Research Computing Services' 3rd annual summer school offers a handful of courses with a wide range of topics to help empower your research. We will cover topics including Linux/Slurm, ARC/HPC, Research Data Management (RDM) and Data Management Plan (DMP), working with research software and workflows, plus much more. The sessions and workshops is available from introductory to intermediate levels and is suitable for everyone interested in research in HPC.

The summer school will run from Monday, June 10 through to Wednesday, June 12, 2024 from 9AM to 5PM. This 3 day event is completely free to all University of Calgary members.

RCS Summer School 2024 Poster

Survey

Please complete our post-event survey to help us improve our future sessions. Each completed survey will enter you into a draw for a $50 gift card. We will contact the winner after the summer school.


Complete Post-Event Survey

Venue

The RCS Summer School 2024 will be held in ICT 102 and ICT 114.

Prior to attending, please note the following:

  • We will be doing registration and check-in outside ICT 102. Please pick up your name tag here.
  • Please arrive 5-15 minutes before the start of the morning or afternoon events to prevent delays (and to pick up a coffee/snack going into the sessions)
  • Please bring your own device for the follow-along / workshop sessions. We are in a lecture theatre with no lab computers.
  • We will be provisioning access for all attendees to our Teaching and Learning Cluster (TALC) automatically. Instructions will be provided during the session on how to connect.

Directions

Visit the UCalgary campus maps and room finder at: https://www.ucalgary.ca/about/our-campuses/campus-maps-and-room-finder

The ICT building floor map is available at: https://ucalgary-gs.maps.arcgis.com/sharing/rest/content/items/236901d34b62420a87e99b947eae9e71/data.

Schedule

The summer school sessions will be held in ICT 102 and ICT 114. Refreshments will be available in ICT 114 on all 3 days (available to anyone registered to the event regardless of the track).

Time June 10 June 11 June 12
Track 1 Track 2 Track 1 Track 2 Track 1 Track 2
8:30 AM Morning registration & check-in
ICT 102
Morning registration & check-in
ICT 102
Morning registration & check-in
ICT 102
9:00 AM Introduction to the Summer School & RCS
ICT 102, 9:00 AM - 9:20 AM
Jill Kowalchuk
The Alliance: An Introduction
ICT 102, 9:00 AM - 9:20 AM
Brock Kahanyshyn
Introduction to containers with Apptainer
ICT 102, 9:00 AM - 9:50 AM
Tannistha Nandi
Fast Dataframes with Polars on Python
ICT 114, 9:00 AM - 9:50 AM
Dave Schulz
9:30 AM Introduction to Linux, Bash, and the command line
ICT 102, 9:30 AM - 10:30 AM
Robert Fridman
Accelerate data science workflows with NVIDIA RAPIDS
ICT 114, 9:30 AM - 11:50 AM
Tarini Bhatnagar
Introduction to HPC resources
ICT 102, 9:30 AM - 10:20 AM
Robert Fridman, Dave Schulz
Prefect for Research Workflow Development
ICT 114, 9:30 AM - 11:00 AM
Pedro Martinez
10:00 AM Managing scientific software with Conda
ICT 102, 10:00 AM - 10:50 AM
Dmitri Rozmanov
Track 2 ends
Refreshments available in
ICT 114
10:30 AM Hands on with Linux & Slurm
ICT 102, 10:30 AM - 11:50 AM
Robert Fridman
Linux tools & utilities for working with large data sets
ICT 102, 10:30 AM - 11:20 AM
Leo Leung, Dave Schulz
11:00 AM Track 2 ends
Refreshments available in
ICT 114
Reproducible Data Management with Datalad: Part I
ICT 102, 11:00 AM - 11:50 AM
David Deepwell
11:30 AM RCS Q&A period: Ask RCS anything
ICT 102, 11:30 AM - 12:00 PM
RCS Team
12:00 PM Lunch break, Afternoon registration & check-in
12:00 PM - 1:00 PM
Lunch break, Afternoon registration & check-in
12:00 PM - 1:00 PM
Lunch break, Afternoon registration & check-in
12:00 PM - 1:00 PM
12:30 PM
1:00 PM Open OnDemand on ARC
ICT 102, 1:00 PM - 1:20 PM
Leo Leung
No Track 2
Refreshments Available in
ICT 114
Research Data Management and Data File Management
ICT 102, 1:00 PM - 2:20 PM
Ingrid Reiche, Jennifer Abel, Alex Thistlewood
No Track 2
Refreshments Available in
ICT 114
Dell & AMD: Machine learning with Dell & AMD
ICT 102, 1:00 PM - 2:30 PM

Rob Parish

No Track 2
Refreshments Available in
ICT 114
1:30 PM AWS: Inspiring the art of the possible
ICT 102, 1:30 PM - 2:00 PM
Patrick Colucci
2:00 PM AWS: How AWS works with Researchers
ICT 102, 2:00 PM - 2:30 PM
Jessica Steed & Hatem Siyala
Reproducible Data Management with Datalad: Part II
ICT 102, 2:30 PM - 3:20 PM
David Deepwell
2:30 PM AWS: Machine Learning with low-code workshop
ICT 102, 2:30 PM - 5:00 PM
Abhi Sodhani
Data in Motion: Navigating Storage Solutions for Active Research Data
ICT 102, 2:30 AM - 4:20 AM
Ian Percel, Jennifer Abel, Alex Thistlewood
3:00 PM
3:30 PM End of day: 3:30 PM
4:00 PM
4:30 PM End of day: 4:30 PM
5:00 PM End of day: 5:00 PM

Sessions

Session Time and Location Synopsis

Introduction to the Summer School & RCS

June 10, 9:00AM - 9:20AM

ICT 102

We will begin the RCS summer school with a quick introduction by Jill Kowalchuk, the Interim director of Research Computing Services. We will introduce the RCS team, provide a high level overview of our services, and how to get help and support from our analysts.
  • Speaker: Jill Kowalchuk
  • Format: Lecture
  • Level: Introductory
  • Prerequisites: None

Introduction to Linux, Bash, and the command line

June 10, 9:30AM - 10:30AM

ICT 102

This course provides you with essential skills to effectively use the Linux command line. We will go over from ground up how to log-in and interact with our HPC cluster, traverse the filesystem, execute programs, and manage files.

This beginner friendly session requires no prior experience to Linux. We recommend bringing your own device to follow along. By the end of the course, you should be familiar with what is possible with the Linux command line.

  • Speaker: Robert Fridman
  • Format: Lecture + Follow along
  • Level: Introductory
  • Prerequisites: None

Workshop: Hands on with Linux & Slurm

June 10, 10:30AM - 11:50 AM

ICT 102

This follow-up workshop comes immediately after the Introduction to Linux session. We will build on what we learned in the previous session and go into details on how to use the HPC cluster using the Slurm scheduler.

This workshop will provide you with the skills necessary to write a simple Slurm batch script, submit jobs to Slurm, view and manage your jobs. By the end of the course, you will be familiar with what Slurm is, how it fits in in a HPC environment, and how to start using Slurm on our HPC clusters for your research.

This is a beginner friendly workshop. You should be familiar with the Linux command line. We recommend bringing your own device to follow along.

  • Speaker: Robert Fridman
  • Format: Workshop + Hands on
  • Level: Introductory
  • Prerequisites: None

Open OnDemand on ARC

June 10, 1:00 PM - 1:20 PM

ICT 102

Did you know you can run a Linux desktop and graphical tools on ARC? This session will cover what ARC Open OnDemand is and how it may help with your research. We will show you how to:
  • Connect to Open OnDemand through your browser
  • Start a graphical desktop environment in our ARC HPC cluster environment
  • View and mange files in your home directory via Open OnDemand
  • Connect to ARC through your web browser
  • View the status of your submitted jobs

By the end of this session, you will be familiar with the options available on Open OnDemand and be able to start graphical sessions through this service. This is a beginner friendly workshop and no prior experience is necessary. We recommend bringing your own device to follow along.

  • Speaker: Leo Leung
  • Format: Lecture + Follow along
  • Level: Introductory
  • Prerequisites: None

Data in Motion: Navigating Storage Solutions for Active Research Data

June 12, 2:30 AM - 4:20 AM

ICT 102

Planning for and requesting specialized storage for large research projects can be a daunting proposition. The variety of storage options and the expected justifications for allocations locally to UCalgary, at national supercomputing sites, and in the public cloud can quickly become overwhelming. This talk aims to provide an introduction to the cost/benefit tradeoff in using different storage systems, when to reach out to different support services around the university for help in making critical decisions, and basic techniques for providing a quantitative justification for a storage request.

By the end of the session, you will be familiar with the types of storage related questions that should be answered when tackling large research projects and the different types of solutions that the University offers our researchers.

  • Speaker: Ian Percel, Jennifer Abel, Alex Thistlewood
  • Format: Lecture
  • Level: Introductory
  • Prerequisites: None

Reproducible Data Management with Datalad

Part I: June 12, 11:00 AM - 11:50 AM

Part II: June 12, 2:30 PM - 3:20 PM ICT 102

Data reproducibility comes from a knowledge of data provenance. This two part workshop introduces DataLad, a digital data management system based on Git which records research provenance. The sessions will demonstrate the creation of a small research project containing data provenance.

Content to be covered in the two-part session includes:

  • Dataset basics
  • Capturing data-provenance
  • Re-executing analyses
  • Collaborative data analysis


  • Speaker: David Deepwell
  • Format: Lecture + demo
  • Level: Introductory
  • Prerequisites: Command line experience, familiarity with Git is advised

Introduction to HPC resources

June 11, 9:30AM - 10:20AM

ICT 102

This session is a primer for those new to high performance computing (HPC) or computing on remote resources. We will build on the foundations built from our previous Linux and Slurm introductory sessions and expand on the larger picture, including:
  • Motivation for using HPC
  • Finding available resources on HPC resources
  • Issues and pitfalls to avoid (such as incorrect job resource requests)
  • Troubleshooting job failures
  • High level overview of parallel programming with Slurm
  • How to transfer data to/from other institutionsThis is a beginner friendly workshop. You should be familiar with the Linux command line. We recommend bringing your own device to follow along.
  • Speaker: Robert Fridman, Dave Schulz
  • Format: Lecture
  • Level: Introductory
  • Prerequisites: Linux command line, Slurm

Linux tools & utilities for working with large data sets

June 11, 10:30AM - 11:20AM

ICT 102

This session introduces more intermediate to advanced uses of the Linux environment for handling large data sets. The course will demonstrate the power of shell pipes and how you can work with large datasets with just standard Linux tools and utilities that is built-in to the system.

We will cover some common use cases including:

  • How to download large datasets from the Internet
  • How to parsing text-based data using tools such as sed, awk, grep
  • How to build powerful text mining, conversion, and visualization with just the command line

This is an intermediate course. You should be familiar with the Linux command line and some common Linux utilities prior to the course. Some understanding of regular expressions may be useful.

  • Speaker: Leo Leung, Dave Schulz
  • Format: Lecture
  • Level: Introductory to Intermediate
  • Prerequisites: Command line experience

RCS Q&A period: Ask RCS anything

June 11, 11:30AM - 12:00PM

ICT 102

This is a general question and answers period where you may ask the Research Computing Services team questions related to RCS and HPC. You may ask both technical and non-technical questions.
  • Speaker: The RCS team
  • Format: Question & Answer period
  • Level: Introductory
  • Prerequisites: None

Research Data Management and Data File Management

June 11, 1:00PM - 2:20PM

ICT 102

Managing your digital files and research materials is critical for keeping yourself organized, collaborating, and communicating with colleagues. In this session, we will cover Research Data Management (RDM) and Data Management Plan (DMP). We will also go over best practices in digital file management depending on your individual and organizational needs.

This presentation will also discuss best practices, versioning, and how to document and share your file and folder convention using a README file. By the end of this session, you should be familiar with RDM and DMP concepts to help keep your research materials organized.

  • Speaker: Ingrid Reiche, Jennifer Abel, Alex Thistlewood (from The University of Calgary Libraries and Cultural Resources)
  • Format: Lecture
  • Level: Introductory
  • Prerequisites: None

Introduction to containers with Apptainer

June 12, 9:00AM - 9:50PM

ICT 102

Reproducible research workflows is essential for repeatability. This session will cover the basics of using containers with Apptainer, a secure container technology designed to be used on for high performance compute clusters. We will cover:
  • When do you want to use a container?
  • How to get a container image?
  • How to run a container image on the cluster?

The instructor for this session will be remote and will be streamed in ICT 102. We will provide a zoom link for those who wishes to attend virtually.

  • Speaker: Tannistha Nandi
  • Format: Lecture
  • Level: Introductory
  • Prerequisites: None

Managing scientific software with Conda

June 12, 10:00 AM - 10:50 AM

ICT 102

Running customized scientific software on a shared HPC environment may be challenging. This session, we will go over how to set up customized software environments using Conda.
  • Speaker: Dmitri Rozmanov
  • Level: Introductory
  • Prerequisites: None

Prefect for Research Workflow Development

June 11, 9:30 AM - 11:00 AM

ICT 114

Modernize your research workflows using Prefect, an open source workflow orchestration tool. In this session we will cover some of the fundamentals of building workflows with Prefect, with examples on how to deploy Prefect on local and distributed computing infrastructure.
  • Speaker: Pedro Martinez
  • Format: Lecture + Follow Along
  • Level: Introductory
  • Prerequisites: Python, Slurm

AWS: Inspiring the art of the possible

June 11, 1:30 PM - 2:00 PM

ICT 102

Learn what is possible on AWS Cloud for research.
  • Speaker: Patrick Colucci
  • Format: Lecture
  • Level: Introductory
  • Prerequisites: None

AWS: How AWS works with Researchers

June 11, 2:00 PM - 2:30 PM

ICT 102

AWS has many programs to support researchers such as credits, letter of supports, immersion days, working on proof of concepts. In this session, we will cover how we engage with researchers and what programs are out there to help accelerate your research with the AWS Cloud.
  • Speaker: Jessica Steed & Hatem Siyala
  • Format: Lecture
  • Level: Introductory
  • Prerequisites: None

AWS: Machine learning with low-code workshop

June 11, 2:30 PM - 5:00 PM

ICT 102

The Machine Learning (ML) journey requires continuous experimentation and rapid prototyping to be successful. In order to create highly accurate and performant models, data scientists have to first experiment with feature engineering, model selection and  optimization techniques. These processes are traditionally time consuming and expensive.

In this workshop attendees will learn the following:

  • How the Low-Code ML capabilities found in Amazon SageMaker Data Wrangler, Autopilot and Jumpstart, make it easier to experiment faster and bring highly accurate models to production more quickly and efficiently
  • How to simplify the process of data preparation and feature engineering, and complete each step of the data preparation workflow
  • Understand how to automatically build, train, and tune the best machine learning models based on your data, while allowing you to maintain full control and visibility.
  • Get started with ML easily and quickly using pre-built solutions for common financial use cases and open source models from popular model zoos.
  • Speaker: Abhi Sodhani
  • Format: Workshop + Hands on
  • Level: Introductory
  • Prerequisites: None

Accelerate data science workflows with NVIDIA RAPIDS

June 10, 9:30 AM - 11:50 AM

ICT 114

Unlock the power of GPU acceleration for your data science projects in our hands-on workshop. This session is designed to introduce participants to NVIDIA RAPIDS, a suite of open-source software libraries and APIs built on CUDA. RAPIDS enables data scientists and analysts to execute end-to-end data science and analytics pipelines entirely on GPUs, significantly speeding up workflows.

In this interactive session, we will:

  • Introduce NVIDIA RAPIDS and its possibilities for data scientists
  • Run RAPIDS in a Jupyter notebook environment on ARC
  • With with sample datasets to perform data manipulation and visualization tasks
  • Explore hands-on coding exercises that illustrate the advantages of GPU accelerated processing

By the end of this session, you will have the basic practical skills necessary to start using RAPIDS for GPU-accelerated research work on our HPC infrastructure.

  • Speaker: Tarini Bhatnagar from NVIDIA
  • Format: Lecture + Follow Along
  • Level: Introductory
  • Prerequisites: Introductory Python and Pandas recommended

Dell Presentation: TBD

June 12, 1:00 PM - 1:50 PM

ICT 102

TBD
  • Speaker: Rob Lucas from Dell
  • Format: Lecture
  • Level: Introductory
  • Prerequisites: None

Fast Dataframes with Polars on Python

June 12, 9:00 AM - 9:50 AM

ICT 114

Have you used Pandas Dataframes with Python and are unhappy with the performance? Can't seem to figure out Pandas' indexing? In this session, we will go over using fast dataframes with Polars.

Polars dataframe features:

  • Automatically use all available CPUs for most operations
  • Can be accessed via SQL queries
  • Supports lazy execution

By the end of this session, you'll have the basic skills to leverage Polars in your Python workflows.

  • Speaker: Dave Schulz
  • Format: Lecture + Hands on
  • Level: Introductory
  • Prerequisites: Python


Materials

Check back soon. Session materials will be posted here after the event.

Session Course materials
Linux tools & utilities for working with large data sets
Amazon Web Services
Prefect for Research Workflow Development Download PowerPoint Slide deck

https://github.com/peterg1t/prefect-summer-school

Open OnDemand Download PowerPoint Slide deck
The Alliance presentation Download PowerPoint Slide deck
Introduction to HPC Containers with Apptainer Download PowerPoint Slide deck
Polars Dataframes Download Polars Jupyter Notebook

Input Data is available On the Arc cluster under /global/software/SummerSchool2024/Polars/data

NVIDIA RAPIDS Download PowerPoint Slide deck
Library File Management Download PowerPoint Slide deck
Managing Scientific Software with Conda Download PowerPoint Slide deck
Prefect for Research Workflow Development Download PowerPoint Slide deck
Introduction to the Linux Command Line Download PowerPoint Slide deck
Introduction to HPC Resources Download PowerPoint Slide deck
Introduction to DataLad Download PowerPoint Slide deck Download Example Scripts

Frequently Asked Questions

Can I attend this event remotely?
We are only offering these sessions in-person on the specified dates in ICT 102 and ICT 114.
Will there be recordings made available afterwards?
The sessions will not be recorded.
Can I forward this invitation to others?
You are welcome to forward this invite to any faculty, staff, and students at the University of Calgary.
I am only interested in one of the sessions in the morning/afternoon period. Can I drop in at anytime to these sessions?
You may show up to only the session that interests you, but we ask that you register for the entire morning/afternoon session that covers the session you are interested in.
Will there be any free food?
We will be providing snacks and refreshments during this event. There will be gluten-free and vegetarian options. The food will be placed in ICT 114 and is available to anyone registered to the event.
What happens if there is no more seats available?
We will be offering up to 100 seats for this session. We may be able to raise this depending on the interest in the sessions. If you are interested in joining a session that has filled, please join the waiting list when ordering the ticket and reach out to us for options.
I have more questions. Who do I contact?
Please reach out to the Research Computing Services team at support [at] hpc.ucalgary.ca.