ARC Cluster Status

ARC status: Cluster maintenance complete.

System is functioning normally. No known issues.

See the ARC Cluster Status page for system notices.

System Messages

⚠️ January System Updates - 2023/01/01

Beginning January 16, 2023, the ARC cluster will undergo operating system updates. We shall do our utmost to minimize disruption and allow ongoing jobs to be completed. New jobs may be temporarily held from scheduling.

The ARC login node will reboot on the morning of January 16. Please save your work and log out if possible.

The upgrade is planned to be fully complete by January 20.

If you encounter any system issues, do not hesitate to let us know.

Thank you for your cooperation.

************************************************************************
2023/01/01
--- ⚠️ January System Updates

Beginning January 16, 2023, the ARC cluster will undergo operating system updates. We shall do our utmost to minimize disruption and allow ongoing jobs to be completed. New jobs may be temporarily held from scheduling.

The ARC login node will reboot on the morning of January 16. Please save your work and log out if possible.

The upgrade is planned to be fully complete by January 20.

If you encounter any system issues, do not hesitate to let us know.

Thank you for your cooperation.

System Updates Completed - 2023/01/24

The upgrade has been completed. The following has been changed:

OS Updated to Rocky Linux 8.7
Slurm updated to 22.05.7
Apptainer replaces Singularity
Each job will have its own /tmp, /dev/shm, /run/user/$uid mounted

If you encounter any system issues, do not hesitate to let us know.

Thank you for your cooperation.

************************************************************************
2023/01/24
--- System Updates Completed

The upgrade has been completed. The following has been changed:

OS Updated to Rocky Linux 8.7
Slurm updated to 22.05.7
Apptainer replaces Singularity
Each job will have its own /tmp, /dev/shm, /run/user/$uid mounted

If you encounter any system issues, do not hesitate to let us know.

Thank you for your cooperation.

Filesystem Issues - 2023/02/28

We are currently investigating a filesystem issue that is causing filesystem slowdowns across ARC.

We will update you with more information as it becomes available.

Thank you for your patience.

************************************************************************
2023/02/28
--- Filesystem Issues

We are currently investigating a filesystem issue that is causing filesystem slowdowns across ARC.

We will update you with more information as it becomes available.

Thank you for your patience.

Filesystem Issues - 2023/03/1

We are still currently investigating a filesystem issue that is causing filesystem slowdowns across ARC. Some jobs on ARC have been paused to help us find the root cause of the slowdowns.

We will update you with more information as it becomes available.

Thank you for your patience.

************************************************************************
2023/03/1
--- Filesystem Issues

We are still currently investigating a filesystem issue that is causing filesystem slowdowns across ARC. Some jobs on ARC have been paused to help us find the root cause of the slowdowns.

We will update you with more information as it becomes available.

Thank you for your patience.

ARC Login node reboot - 2023/03/2

The ARC login node will be rebooted this afternoon for an emergency maintenance. This downtime is needed to help mitigate the filesystem slowdowns experienced on the login node. Jobs will continue running and scheduling during this time.

All logins to the ARC login node will be terminated at 3:00PM and will remain unavailable until 4:00PM.

We apologize for the inconvenience and thank you for your patience.

************************************************************************
2023/03/2
--- ARC Login node reboot

The ARC login node will be rebooted this afternoon for an emergency maintenance. This downtime is needed to help mitigate the filesystem slowdowns experienced on the login node.  Jobs will continue running and scheduling during this time.

All logins to the ARC login node will be terminated at 3:00PM and will remain unavailable until 4:00PM.

We apologize for the inconvenience and thank you for your patience.

⚠️ Filesystem Issues - 2023/03/2

We are still currently investigating a filesystem issue that is causing filesystem slowdowns on specific nodes in our MSRDC location.

We will update you with more information as it becomes available.

We apologize for the inconvenience and thank you for your patience.

************************************************************************
2023/03/2
--- ⚠️ Filesystem Issues

We are still currently investigating a filesystem issue that is causing filesystem slowdowns on specific nodes in our MSRDC location.

We will update you with more information as it becomes available.

We apologize for the inconvenience and thank you for your patience.

Filesystem Issues Resolved - 2023/03/10

We have upgraded the filesystem routers in our MSRDC location to address the performance issues.

Please let us know if you experience any issues with the filesystem performance.

Thank-you for your patience.

************************************************************************
2023/03/10
--- Filesystem Issues Resolved

We have upgraded the filesystem routers in our MSRDC location to address the performance issues.

Please let us know if you experience any issues with the filesystem performance.

Thank-you for your patience.

Open OnDemand reboot - 2023/05/01

On May 1, 2023, the ARC Open OnDemand node will be rebooted between 5PM and 6PM. Expected downtime will be approximately 15 minutes.

If you encounter any system issues, do not hesitate to let us know.

Thank you for your cooperation.

************************************************************************
2023/05/01
--- Open OnDemand reboot

On May 1, 2023, the ARC Open OnDemand node will be rebooted between 5PM and 6PM. Expected downtime will be approximately 15 minutes.

If you encounter any system issues, do not hesitate to let us know.

Thank you for your cooperation.

Apptainer (Singularity) on ARC Login Node - 2023/06/22

Apptainer (Singularity) containers may experience an error when

running on the Arc login node. If apptainer complains that a system administrator needs to enable user namespaces, simply run your containers inside a job.

This is a temporary measure due to security vulnerability that will be

patched soon.

************************************************************************
2023/06/22
--- Apptainer (Singularity) on ARC Login Node

Apptainer (Singularity) containers may experience an error when

running on the Arc login node. If apptainer complains that a system administrator needs to enable user namespaces, simply run your containers inside a job.

This is a temporary measure due to security vulnerability that will be patched soon.

Lattice, Single, cpu2013 partition changes - 2023/07/13

The Lattice and Single, and cpu2013 have all been decomissioned. The Single

partition will be replaced by the nodes formerly in the cpu2013 partition but

will be called single.

************************************************************************
2023/07/13
--- Lattice, Single, cpu2013 partition changes

The Lattice and Single, and cpu2013 have all been decomissioned.  The Single

partition will be replaced by the nodes formerly in the cpu2013 partition but will be called single.

Open OnDemand reboot - 2023/10/17

Open OnDemand will be rebooted on October 17, 2023 for an update. It will be down for up to 30 minutes.

************************************************************************
2023/10/17
--- Open OnDemand reboot

Open OnDemand will be rebooted on October 17, 2023 for an update. It will be down for up to 30 minutes.

Storage Upgrade MARC/ARC cluster - 2023/10/23

We will be performing storage upgrades on the MARC/ARC cluster on

November 16 and 17, 2023. To facilitate this, we will be throttling down the number of jobs on both clusters while the upgrades are

performed

************************************************************************
2023/10/23
--- Storage Upgrade MARC/ARC cluster

We will be performing storage upgrades on the MARC/ARC cluster on

November 16 and 17, 2023. To facilitate this, we will be throttling down the number of jobs on both clusters while the upgrades are performed

Systems Operating Normally - 2024/05/3

************************************************************************
2024/05/3
--- Systems Operating Normally

Power Interruption - 2024/05/07

Arc Experienced an brief power outage around 11AM May 7, 2024.

Most compute nodes have or are rebooting. Most jobs running at this time were lost. Arc administrators are actively working on restarting compute

nodes. Sorry for the inconvenience.

************************************************************************
2024/05/07
--- Power Interruption

Arc Experienced an brief power outage around 11AM May 7, 2024.

Most compute nodes have or are rebooting. Most jobs running at this time were lost. Arc administrators are actively working on restarting compute nodes. Sorry for the inconvenience.

GPU a100 Node Reservation - 2024/06/03

Job submissions targeted to the GPU a100 partition will be

affected by a temporary reservation on the nodes to accommodate the RCS summer school class taking place on 2024/Jun/10. Reservation will end shortly after. Please submit your jobs normally and the scheduler will

start them as soon as the nodes are available. Sorry for the inconvenience.

************************************************************************
2024/06/03
--- GPU a100 Node Reservation

Job submissions targeted to the  GPU a100 partition will be

affected by a temporary reservation on the nodes to accommodate the RCS summer school class taking place on 2024/Jun/10. Reservation will end shortly after. Please submit your jobs normally and the scheduler will start them as soon as the nodes are available. Sorry for the inconvenience.

GPU a100 Node Reservation Removed - 2024/06/11

GPU a100 Nodes in ARC have been returned to normal scheduling.

************************************************************************
2024/06/11
--- GPU a100 Node Reservation Removed

GPU a100 Nodes in ARC have been returned to normal scheduling.

Notice of Upcoming Partial Outage - 2024/08/23

Several compute nodes from the ARC cluster will be unavailable

between Sept 23 to Sept 27 inclusive (subject to change). All compute nodes in cpu2019, cpu2021/2, gpu-v100 most nodes from bigmem and gpu-a100 will be

affected. These nodes will return to service as soon as the work is complete.

************************************************************************
2024/08/23
--- Notice of Upcoming Partial Outage

Several compute nodes from the ARC cluster will be unavailable

between Sept 23 to Sept 27 inclusive (subject to change). All compute nodes in cpu2019, cpu2021/2, gpu-v100 most nodes from bigmem and gpu-a100 will be affected. These nodes will return to service as soon as the work is complete.

Partial Outage Update I - 2024/09/25

Due to hardware issues that is blocking our original maintenance window, most compute nodes that were taken offline on Monday has been brought back online today. An additional partial outage will occur again starting next Tuesday for the same nodes.

On Tuesday, October 1, 2024, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until Friday October 4, 2024.

We apologise for the inconvenience.

************************************************************************
2024/09/25
--- Partial Outage Update I

Due to hardware issues that is blocking our original maintenance window, most compute nodes that were taken offline on Monday has been brought back online today. An additional partial outage will occur again starting next Tuesday for the same nodes.

On Tuesday, October 1, 2024, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until Friday October 4, 2024.

We apologise for the inconvenience.

Partial Outage Update II - 2024/10/04

The maintenance window will be extended until at least Monday, October 7, 2024 due to a power distribution issue in our renovated data centre.

Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Monday, October 7, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].

We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.

************************************************************************
2024/10/04
--- Partial Outage Update II

The maintenance window will be extended until at least Monday, October 7, 2024 due to a power distribution issue in our renovated data centre.

Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Monday, October 7, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].

We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.

Partial Outage Update III - 2024/10/07

Due to technical issues beyond our control the maintenance window will be extended until at least Tuesday, October 15, 2024.

Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Tuesday, October 15, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].

We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.

************************************************************************
2024/10/07
--- Partial Outage Update III

Due to technical issues beyond our control the maintenance window will be extended until at least Tuesday, October 15, 2024.

Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Tuesday, October 15, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].

We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.

Normal Scheduling has resumed. - 2024/10/08

The ARC cluster has been successfully brought online and nodes are running jobs normally. We apologize for the extended downtime. Please reach out to support@hpc.ucalgary.ca with any issues or concerns.

************************************************************************
2024/10/08
--- Normal Scheduling has resumed.

The ARC cluster has been successfully brought online and nodes are running jobs normally. We apologize for the extended downtime.

Please reach out to support@hpc.ucalgary.ca with any issues or concerns.

Scheduled Maintenance - 2024/12/11

The ARC login node will be rebooted on Tuesday December 17 for scheduled maintenance. It will be down for a few minutes and return shortly. Job scheduling and jobs running on the cluster will not be affected. Thank you for understanding. Please reach out to support@hpc.ucalgary.ca with any issues or concerns.

************************************************************************
2024/12/11
--- Scheduled Maintenance

The ARC login node will be rebooted on Tuesday December 17 for scheduled maintenance. It will be down for a few minutes and return shortly. Job scheduling and jobs running on the cluster will not be affected. Thank you for understanding.

Please reach out to support@hpc.ucalgary.ca with any issues or concerns.

Scheduled Maintenance and OS Update - 2025/01/07

The ARC cluster will be rebooted for OS updates on Monday January 20, 2025. Please make sure to save your work and log out before the reboot happens. Scheduling will be paused until the cluster is back, but queued jobs will remain in the queue and nodes will start scheduling when the cluster is ready. Thank you for understanding. Please reach out to support@hpc.ucalgary.ca with any issues or concerns.

************************************************************************
2025/01/07
--- Scheduled Maintenance and OS Update

The ARC cluster will be rebooted for OS updates on Monday January 20, 2025. Please make sure to save your work and log out before the reboot happens. Scheduling will be paused until the cluster is back, but queued jobs will remain in the queue and nodes will start scheduling when the cluster is ready. Thank you for understanding.

Please reach out to support@hpc.ucalgary.ca with any issues or concerns.

⚠️ Scheduled Maintenance and OS Update - 2025/01/15

The ARC cluster will be down for maintenance and upgrades starting 9AM Monday, January 20, 2025 through Wednesday, January 22, 2025.

For the duration of the upgrade window:

Scheduling will be paused and new jobs will be queued. Any queued jobs will start scheduling only after the upgrade is complete.
Access to files via the login node and arc-dtn will generally be available but intermittent. File transfers on the DTN node, including Globus file transfers, may be interrupted during this window.

Please make sure to save your work prior to this outage window to avoid any loss of work.

During this time the following changes will happen:

1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:

cpu2023 (temporary)
Parallel
Theia/Synergy/cpu2017-bf05
Single

Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.

2. A component of the NetApp filer will be replaced. Access to /bulk will be unavailable on Wednesday, January 22, 2025.

3. The compute node operating system will be updated to Rocky Linux 8.10.

4. The Slurm scheduling system will be upgraded.

5. The Open OnDemand web portal will be upgraded.

Please reach out to support@hpc.calgary.ca with any issues or concerns.

⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️ Update Jan 18, 2025

Around 10AM Arc experienced an electrical power brownout. Some percentage (how many is unknown at this time) of the nodes lost electrical power during this time causing a loss of a number of running jobs.

Sorry for the inconvenience.

Since Arc is shutting down for maintenance on Monday Jan 20, replacement jobs will likely not start unless they request a timelimit less than the time until 8AM Monday.

⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️

************************************************************************
2025/01/15
--- ⚠️ Scheduled Maintenance and OS Update

The ARC cluster will be down for maintenance and upgrades starting 9AM Monday, January 20, 2025 through Wednesday, January 22, 2025.

For the duration of the upgrade window:

Scheduling will be paused and new jobs will be queued. Any queued jobs will start scheduling only after the upgrade is complete.
Access to files via the login node and arc-dtn will generally be available but intermittent. File transfers on the DTN node, including Globus file transfers, may be interrupted during this window.

Please make sure to save your work prior to this outage window to avoid any loss of work.

During this time the following changes will happen:

1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:

cpu2023 (temporary)
Parallel
Theia/Synergy/cpu2017-bf05
Single

Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.

2. A component of the NetApp filer will be replaced. Access to /bulk will be unavailable on Wednesday, January 22, 2025.

3. The compute node operating system will be updated to Rocky Linux 8.10.

4. The Slurm scheduling system will be upgraded.

5. The Open OnDemand web portal will be upgraded.

Please reach out to support@hpc.calgary.ca with any issues or concerns.

⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️ Update Jan 18, 2025

Around 10AM Arc experienced an electrical power brownout. Some percentage (how many is unknown at this time) of the nodes lost electrical power during this time causing a loss of a number of running jobs.

Sorry for the inconvenience.

Since Arc is shutting down for maintenance on Monday Jan 20, replacement jobs will likely not start unless they request a timelimit less than the time until 8AM Monday. ⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️

Maintenance Complete - 2025/01/22

The ARC cluster upgrade is complete

During this time the following changes happened:

1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:

cpu2023 (temporary)
Parallel
Theia/Synergy/cpu2017-bf05
Single

Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.

2. A component of the NetApp filer was replaced successfully.

3. The compute node operating was updated to Rocky Linux 8.10.

4. The Slurm scheduling system was upgraded.

5. The Open OnDemand web portal was upgraded.

6. The Parallel partition was renamed to Legacy to show the lack of an interconnect for parallel MPI work and was restricted to maximum 4 node jobs.

Please reach out to support@hpc.calgary.ca with any issues or concerns.

Jan 23, 9:08AM

Remount complete. arc is back in full service.

************************************************************************
2025/01/22
--- Maintenance Complete

The ARC cluster upgrade is complete

During this time the following changes happened:

1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:

cpu2023 (temporary)
Parallel
Theia/Synergy/cpu2017-bf05
Single

Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.

2. A component of the NetApp filer was replaced successfully.

3. The compute node operating was updated to Rocky Linux 8.10.

4. The Slurm scheduling system was upgraded.

5. The Open OnDemand web portal was upgraded.

6. The Parallel partition was renamed to Legacy to show the lack of an interconnect for parallel MPI work and was restricted to maximum 4 node jobs.

Please reach out to support@hpc.calgary.ca with any issues or concerns.

Jan 23, 9:08AM

Remount complete. arc is back in full service.

Module Command Upgrade - 2025/02/03

Upgrade of the module command On Tuesday, February 11, 2025 the module command will be upgraded to a new verson on Arc. This should result in new capabilities and a slightly different visual experience when using the module command. Loading modules is not expected to change.

************************************************************************
2025/02/03
--- Module Command Upgrade

Upgrade of the module command

On Tuesday, February 11, 2025 the module command will be upgraded to a new verson on Arc. This should result in new capabilities and a slightly different visual experience when using the module command. Loading modules is not expected to change.

Module Command Upgraded - 2025/02/12

The module command was upgraded On Tuesday, February 12, 2025 the module command was upgraded to a new verson on Arc. This should result in new capabilities and a slightly different visual experience when using the module command. Loading modules should not have changed.

************************************************************************
2025/02/12
--- Module Command Upgraded

The module command was upgraded

On Tuesday, February 12, 2025 the module command was upgraded to a new verson on Arc. This should result in new capabilities and a slightly different visual experience when using the module command. Loading modules should not have changed.

Support email address down - 2025/03/07

support@hpc.ucalgary.ca Unavailable

Please be informed that our support email address (support@hpc.ucalgary.ca) for RCS is currently not working. We are working to bring it back as soon as possible. Please keep an eye on this space for updates. The clusters are working normally, but support will not receive your messages at this time. We will begin responding as soon as we can get it back.

Apologies for the inconvenience.

************************************************************************
2025/03/07
--- Support email address down

support@hpc.ucalgary.ca Unavailable

Please be informed that our support email address (support@hpc.ucalgary.ca) for RCS is currently not working. We are working to bring it back as soon as possible. Please keep an eye on this space for updates. The clusters are working normally, but support will not receive your messages at this time. We will begin responding as soon as we can get it back.

Apologies for the inconvenience.

Support email address functional - 2025/03/07

support@hpc.ucalgary.ca is back

support@hpc.ucalgary.ca has been repaired and RCS can be contacted there. If you had reached out for assistance in recent days without response please follow up as we may not have received your initial email.

Apologies for the inconvenience.

************************************************************************
2025/03/07
--- Support email address functional

support@hpc.ucalgary.ca is back

support@hpc.ucalgary.ca has been repaired and RCS can be contacted there. If you had reached out for assistance in recent days without response please follow up as we may not have received your initial email.

Apologies for the inconvenience.

Interactive Job Timelimit will be Enforced - 2025/04/11

In order to improve the scheduling and job throughput efficiency of ARC, interactive jobs will be limited to a maximum of 5 hours of runtime. Interactive jobs that are submitted with a timelimit over 5 hours will be rejected at submission time. This change will be made on Monday, April 28, 2025.

************************************************************************
2025/04/11
--- Interactive Job Timelimit will be Enforced

In order to improve the scheduling and job throughput efficiency of ARC, interactive jobs will be limited to a maximum of 5 hours of runtime.  Interactive jobs that are submitted with a timelimit over 5 hours will be rejected at submission time.  This change will be made on Monday, April 28, 2025.

Interactive Job Timelimit Is Now Enforced - 2025/04/28

In order to improve the scheduling and job throughput efficiency of ARC, interactive jobs are now limited to a maximum of 5 hours of runtime. Interactive jobs that are submitted with a timelimit over 5 hours will be rejected at submission time.

Apr 29, 2025

To increase the security posture of the Arc cluster administrators will be installing Trend Micro on cluster login nodes over the week starting Apr 30. Please report any inconsistencies to support@hpc.ucalgary.ca

************************************************************************
2025/04/28
--- Interactive Job Timelimit Is Now Enforced

In order to improve the scheduling and job throughput efficiency of ARC, interactive jobs are now limited to a maximum of 5 hours of runtime.  Interactive jobs that are submitted with a timelimit over 5 hours will be rejected at submission time.

Apr 29, 2025 To increase the security posture of the Arc cluster administrators will be installing Trend Micro on cluster login nodes over the week starting Apr 30. Please report any inconsistencies to support@hpc.ucalgary.ca

⚠️ Bulk Filesystem Emergency Maintenance - 2025/07/29

The filer that provides the /bulk filesystem will be down for emergency repairs at 12 Noon on Thursday July 31. No access to files on /bulk will be possible for the duration of the multi-hour outage. Any jobs running that access /bulk will start and then pause when access to /bulk is attempted. Jobs should continue once service is restored. Sorry for the inconvenience.

************************************************************************
2025/07/29
--- ⚠️ Bulk Filesystem Emergency Maintenance

The filer that provides the /bulk filesystem will be down for emergency repairs at 12 Noon on Thursday July 31.  No access to files on /bulk will be possible for the duration of the multi-hour outage.  Any jobs running that access /bulk will start and then pause when access to /bulk is attempted.  Jobs should continue once service is restored.  Sorry for the inconvenience.

Bulk Filesystem Maintenance Complete - 2025/08/01

Maintenance on /bulk was completed successfully and all filesystems are back in service on Arc.

************************************************************************
2025/08/01
--- Bulk Filesystem Maintenance Complete

Maintenance on /bulk was completed successfully and all filesystems are back in service on Arc.

GPU partition changes on ARC - 2025/08/15

New gpu-h100 and gpu-l40 partitions are available for general scheduling with new gpu hardware. gpu-v100 also has 6 fewer nodes. You can view more details about the node specs by running the arc.hardware script on the login node.

************************************************************************
2025/08/15
--- GPU partition changes on ARC

New gpu-h100 and gpu-l40 partitions are available for general scheduling with new gpu hardware. gpu-v100 also has 6 fewer nodes. You can view more details about the node specs by running the arc.hardware script on the login node.

Update and Bulk outage on ARC - 2025/09/15

⚠️ Arc Update September 15-19

An update of the Arc cluster will begin on Sep 15. This will result in fewer resources while the compute nodes are restarted. The login node and Scheduler will be restarted on Sep 17.

⚠️ Bulk Filesystem Maintenance September 17

The filer that provides the /bulk filesystem will be down for emergency repairs at 9 AM on Wednesday Sep 17th. No access to files on /bulk will be possible for the duration of the multi-hour outage. Any jobs running that access /bulk will start and then pause when access to /bulk is attempted. Jobs should continue once service is restored. We apologize for any inconvenience.

************************************************************************
2025/09/15
--- Update and Bulk outage on ARC

⚠️ Arc Update September 15-19

An update of the Arc cluster will begin on Sep 15. This will result in fewer resources while the compute nodes are restarted. The login node and Scheduler will be restarted on Sep 17.

⚠️ Bulk Filesystem Maintenance September 17 The filer that provides the /bulk filesystem will be down for emergency repairs at 9 AM on Wednesday Sep 17th. No access to files on /bulk will be possible for the duration of the multi-hour outage. Any jobs running that access /bulk will start and then pause when access to /bulk is attempted. Jobs should continue once service is restored. We apologize for any inconvenience.

Update and Bulk outage on ARC - 2025/09/18

Arc update complete The update to arc is complete.

************************************************************************
2025/09/18
--- Update and Bulk outage on ARC

Arc update complete

The update to arc is complete.

GPU partition changes on ARC - 2025/09/23

gpu-v100 has 6 more nodes. You can view more details about the node specs by running the arc.hardware script on the login node.

************************************************************************
2025/09/23
--- GPU partition changes on ARC

gpu-v100 has 6 more nodes. You can view more details about the node specs by running the arc.hardware script on the login node.

Systems Operating Normally - 2025/11/4

************************************************************************
2025/11/4
--- Systems Operating Normally

Legacy compute nodes are being retired - 2025/11/20

ARC nodes cn[0513-1096] are being removed from the arc cluster. They will be removed from scheduling and removed from the cluster over the next while.

************************************************************************
2025/11/20
--- Legacy compute nodes are being retired

ARC nodes cn[0513-1096] are being removed from the arc cluster. They will be removed from scheduling and removed from the cluster over the next while.

ARC Cluster Status

System Messages

⚠️ January System Updates - 2023/01/01

System Updates Completed - 2023/01/24

Filesystem Issues - 2023/02/28

Filesystem Issues - 2023/03/1

ARC Login node reboot - 2023/03/2

⚠️ Filesystem Issues - 2023/03/2

Filesystem Issues Resolved - 2023/03/10

Open OnDemand reboot - 2023/05/01

Apptainer (Singularity) on ARC Login Node - 2023/06/22

Lattice, Single, cpu2013 partition changes - 2023/07/13

Open OnDemand reboot - 2023/10/17

Storage Upgrade MARC/ARC cluster - 2023/10/23

Systems Operating Normally - 2024/05/3

Power Interruption - 2024/05/07

GPU a100 Node Reservation - 2024/06/03

GPU a100 Node Reservation Removed - 2024/06/11

Notice of Upcoming Partial Outage - 2024/08/23

Partial Outage Update I - 2024/09/25

Partial Outage Update II - 2024/10/04

Partial Outage Update III - 2024/10/07

Normal Scheduling has resumed. - 2024/10/08

Scheduled Maintenance - 2024/12/11

Scheduled Maintenance and OS Update - 2025/01/07

⚠️ Scheduled Maintenance and OS Update - 2025/01/15

Maintenance Complete - 2025/01/22

Module Command Upgrade - 2025/02/03

Module Command Upgraded - 2025/02/12

Support email address down - 2025/03/07

Support email address functional - 2025/03/07

Interactive Job Timelimit will be Enforced - 2025/04/11

Interactive Job Timelimit Is Now Enforced - 2025/04/28

⚠️ Bulk Filesystem Emergency Maintenance - 2025/07/29

Bulk Filesystem Maintenance Complete - 2025/08/01

GPU partition changes on ARC - 2025/08/15

Update and Bulk outage on ARC - 2025/09/15

Update and Bulk outage on ARC - 2025/09/18

GPU partition changes on ARC - 2025/09/23

Systems Operating Normally - 2025/11/4

Legacy compute nodes are being retired - 2025/11/20

Navigation menu

Search