<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://rcs.ucalgary.ca/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Cemagata</id>
	<title>RCSWiki - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://rcs.ucalgary.ca/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Cemagata"/>
	<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/Special:Contributions/Cemagata"/>
	<updated>2026-04-10T11:06:13Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.43.3</generator>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=4038</id>
		<title>ARC Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=4038"/>
		<updated>2026-02-20T21:56:02Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{ARC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ January System Updates&lt;br /&gt;
| date = 2023/01/01&lt;br /&gt;
| message =&lt;br /&gt;
Beginning January 16, 2023, the ARC cluster will undergo operating system updates. We shall do our utmost to minimize disruption and allow ongoing jobs to be completed. New jobs may be temporarily held from scheduling.&lt;br /&gt;
&lt;br /&gt;
The ARC login node will reboot on the morning of January 16. Please save your work and log out if possible.&lt;br /&gt;
&lt;br /&gt;
The upgrade is planned to be fully complete by January 20.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = System Updates Completed&lt;br /&gt;
| date = 2023/01/24&lt;br /&gt;
| message =&lt;br /&gt;
The upgrade has been completed. The following has been changed:&lt;br /&gt;
* OS Updated to Rocky Linux 8.7&lt;br /&gt;
* Slurm updated to 22.05.7&lt;br /&gt;
* Apptainer replaces Singularity&lt;br /&gt;
* Each job will have its own /tmp, /dev/shm, /run/user/$uid mounted&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/02/28&lt;br /&gt;
| message =&lt;br /&gt;
We are currently investigating a filesystem issue that is causing filesystem slowdowns across ARC.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/03/1&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns across ARC. Some jobs on ARC have been paused to help us find the root cause of the slowdowns.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ARC Login node reboot&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
The ARC login node will be rebooted this afternoon for an emergency maintenance. This downtime is needed to help mitigate the filesystem slowdowns experienced on the login node.  Jobs will continue running and scheduling during this time.&lt;br /&gt;
&lt;br /&gt;
All logins to the ARC login node will be terminated at 3:00PM and will remain unavailable until 4:00PM.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Filesystem Issues&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns on specific nodes in our MSRDC location.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues Resolved&lt;br /&gt;
| date = 2023/03/10&lt;br /&gt;
| message =&lt;br /&gt;
We have upgraded the filesystem routers in our MSRDC location to address the performance issues.&lt;br /&gt;
&lt;br /&gt;
Please let us know if you experience any issues with the filesystem performance.&lt;br /&gt;
&lt;br /&gt;
Thank-you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/05/01&lt;br /&gt;
| message =&lt;br /&gt;
On May 1, 2023, the ARC Open OnDemand node will be rebooted between 5PM and 6PM. Expected downtime will be approximately 15 minutes.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Apptainer (Singularity) on ARC Login Node&lt;br /&gt;
| date = 2023/06/22&lt;br /&gt;
| message =&lt;br /&gt;
Apptainer (Singularity) containers may experience an error when&lt;br /&gt;
running on the Arc login node. If apptainer complains that a system&lt;br /&gt;
administrator needs to enable user namespaces, simply run your&lt;br /&gt;
containers inside a job.&lt;br /&gt;
&lt;br /&gt;
This is a temporary measure due to security vulnerability that will be&lt;br /&gt;
patched soon.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Lattice, Single, cpu2013 partition changes&lt;br /&gt;
| date = 2023/07/13&lt;br /&gt;
| message =&lt;br /&gt;
The Lattice and Single, and cpu2013 have all been decomissioned.  The Single&lt;br /&gt;
partition will be replaced by the nodes formerly in the cpu2013 partition but&lt;br /&gt;
will be called single.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/10/17&lt;br /&gt;
| message =&lt;br /&gt;
Open OnDemand will be rebooted on October 17, 2023 for an update. It will be down for up to 30 minutes.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Storage Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2023/10/23&lt;br /&gt;
| message =&lt;br /&gt;
We will be performing storage upgrades on the MARC/ARC cluster on &lt;br /&gt;
November 16 and 17, 2023. To facilitate this, we will be throttling &lt;br /&gt;
down the number of jobs on both clusters while the upgrades are &lt;br /&gt;
performed&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/05/3&lt;br /&gt;
| message =&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Power Interruption&lt;br /&gt;
| date = 2024/05/07&lt;br /&gt;
| message = Arc Experienced an brief power outage around 11AM May 7, 2024.&lt;br /&gt;
Most compute nodes have or are rebooting.  Most jobs running at this time &lt;br /&gt;
were lost. Arc administrators are actively working on restarting compute &lt;br /&gt;
nodes. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation&lt;br /&gt;
| date = 2024/06/03&lt;br /&gt;
| message = Job submissions targeted to the  GPU a100 partition will be &lt;br /&gt;
affected by a temporary reservation on the nodes to accommodate the RCS&lt;br /&gt;
summer school class taking place on 2024/Jun/10. Reservation will end &lt;br /&gt;
shortly after. Please submit your jobs normally and the scheduler will &lt;br /&gt;
start them as soon as the nodes are available. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation Removed&lt;br /&gt;
| date = 2024/06/11&lt;br /&gt;
| message = GPU a100 Nodes in ARC have been returned to normal scheduling. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Notice of Upcoming Partial Outage&lt;br /&gt;
| date = 2024/08/23&lt;br /&gt;
| message = Several compute nodes from the ARC cluster will be unavailable &lt;br /&gt;
between Sept 23 to Sept 27 inclusive (subject to change). All compute nodes &lt;br /&gt;
in cpu2019, cpu2021/2, gpu-v100 most nodes from bigmem and gpu-a100 will be &lt;br /&gt;
affected. These nodes will return to service as soon as the work is complete.  &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update I&lt;br /&gt;
| date = 2024/09/25&lt;br /&gt;
| message = Due to hardware issues that is blocking our original maintenance window, most compute nodes that were taken offline on Monday has been brought back online today. An additional partial outage will occur again starting next Tuesday for the same nodes.&lt;br /&gt;
&lt;br /&gt;
On Tuesday, October 1, 2024, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until Friday October 4, 2024.&lt;br /&gt;
&lt;br /&gt;
We apologise for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update II&lt;br /&gt;
| date = 2024/10/04&lt;br /&gt;
| message = The maintenance window will be extended until at least Monday, October 7, 2024 due to a power distribution issue in our renovated data centre.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Monday, October 7, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update III&lt;br /&gt;
| date = 2024/10/07&lt;br /&gt;
| message = Due to technical issues beyond our control the maintenance window will be extended until at least Tuesday, October 15, 2024.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Tuesday, October 15, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Normal Scheduling has resumed. &lt;br /&gt;
| date = 2024/10/08&lt;br /&gt;
| message = The ARC cluster has been successfully brought online and nodes are running jobs normally. We apologize for the extended downtime. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance &lt;br /&gt;
| date = 2024/12/11&lt;br /&gt;
| message = The ARC login node will be rebooted on Tuesday December 17 for scheduled maintenance. It will be down for a few minutes and return shortly. Job scheduling and jobs running on the cluster will not be affected. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/07&lt;br /&gt;
| message = The ARC cluster will be rebooted for OS updates on Monday January 20, 2025. Please make sure to save your work and log out before the reboot happens. Scheduling will be paused until the cluster is back, but queued jobs will remain in the queue and nodes will start scheduling when the cluster is ready. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/15&lt;br /&gt;
| message = The ARC cluster will be down for maintenance and upgrades starting 9AM Monday, January 20, 2025 through Wednesday, January 22, 2025. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For the duration of the upgrade window:&lt;br /&gt;
* Scheduling will be paused and new jobs will be queued. Any queued jobs will start scheduling only after the upgrade is complete.&lt;br /&gt;
* Access to files via the login node and arc-dtn will generally be available but intermittent. File transfers on the DTN node, including Globus file transfers, may be interrupted during this window.&lt;br /&gt;
&lt;br /&gt;
Please make sure to save your work prior to this outage window to avoid any loss of work.&lt;br /&gt;
&lt;br /&gt;
During this time the following changes will happen:&lt;br /&gt;
&lt;br /&gt;
1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:&lt;br /&gt;
* cpu2023 (temporary)&lt;br /&gt;
* Parallel&lt;br /&gt;
* Theia/Synergy/cpu2017-bf05&lt;br /&gt;
* Single&lt;br /&gt;
Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.&lt;br /&gt;
&lt;br /&gt;
2. A component of the NetApp filer will be replaced. Access to /bulk will be unavailable on Wednesday, January 22, 2025.&lt;br /&gt;
&lt;br /&gt;
3. The compute node operating system will be updated to Rocky Linux 8.10.&lt;br /&gt;
&lt;br /&gt;
4. The Slurm scheduling system will be upgraded.&lt;br /&gt;
&lt;br /&gt;
5. The Open OnDemand web portal will be upgraded.&lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.calgary.ca with any issues or concerns.&lt;br /&gt;
&lt;br /&gt;
⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️&lt;br /&gt;
Update Jan 18, 2025&lt;br /&gt;
&lt;br /&gt;
Around 10AM Arc experienced an electrical power brownout.  Some percentage (how many is unknown at this time) of the nodes lost electrical power during this time causing a loss of a number of running jobs.  &lt;br /&gt;
&lt;br /&gt;
Sorry for the inconvenience.  &lt;br /&gt;
&lt;br /&gt;
Since Arc is shutting down for maintenance on Monday Jan 20, replacement jobs will likely not start unless they request a timelimit less than the time until 8AM Monday.  &lt;br /&gt;
⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Maintenance Complete&lt;br /&gt;
| date = 2025/01/22&lt;br /&gt;
| message = The ARC cluster upgrade is complete&lt;br /&gt;
&lt;br /&gt;
During this time the following changes happened:&lt;br /&gt;
&lt;br /&gt;
1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:&lt;br /&gt;
* cpu2023 (temporary)&lt;br /&gt;
* Parallel&lt;br /&gt;
* Theia/Synergy/cpu2017-bf05&lt;br /&gt;
* Single&lt;br /&gt;
Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.&lt;br /&gt;
&lt;br /&gt;
2. A component of the NetApp filer was replaced successfully.&lt;br /&gt;
&lt;br /&gt;
3. The compute node operating was updated to Rocky Linux 8.10.&lt;br /&gt;
&lt;br /&gt;
4. The Slurm scheduling system was upgraded.&lt;br /&gt;
&lt;br /&gt;
5. The Open OnDemand web portal was upgraded.&lt;br /&gt;
&lt;br /&gt;
6. The Parallel partition was renamed to Legacy to show the lack of an interconnect for parallel MPI work and was restricted to maximum 4 node jobs.&lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.calgary.ca with any issues or concerns.&lt;br /&gt;
&lt;br /&gt;
Jan 23, 9:08AM&lt;br /&gt;
&lt;br /&gt;
Remount complete.  arc is back in full service.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Module Command Upgrade&lt;br /&gt;
| date = 2025/02/03&lt;br /&gt;
| message = Upgrade of the module command&lt;br /&gt;
&lt;br /&gt;
On Tuesday, February 11, 2025 the module command will be upgraded to a new verson on Arc. This should result in new capabilities and a slightly different visual experience when using the module command.  Loading modules is not expected to change.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Module Command Upgraded&lt;br /&gt;
| date = 2025/02/12&lt;br /&gt;
| message = The module command was upgraded&lt;br /&gt;
&lt;br /&gt;
On Tuesday, February 12, 2025 the module command was upgraded to a new verson on Arc. This should result in new capabilities and a slightly different visual experience when using the module command.  Loading modules should not have changed.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address down&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca Unavailable&lt;br /&gt;
&lt;br /&gt;
Please be informed that our support email address (support@hpc.ucalgary.ca) for RCS is currently not working. We are working to bring it back as soon as possible. Please keep an eye on this space for updates. The clusters are working normally, but support will not receive your messages at this time. We will begin responding as soon as we can get it back. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address functional&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca is back&lt;br /&gt;
&lt;br /&gt;
support@hpc.ucalgary.ca has been repaired and RCS can be contacted there. If you had reached out for assistance in recent days without response please follow up as we may not have received your initial email. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Interactive Job Timelimit will be Enforced&lt;br /&gt;
| date = 2025/04/11&lt;br /&gt;
| message = In order to improve the scheduling and job throughput efficiency of ARC, interactive jobs will be limited to a maximum of 5 hours of runtime.  Interactive jobs that are submitted with a timelimit over 5 hours will be rejected at submission time.  This change will be made on Monday, April 28, 2025.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Interactive Job Timelimit Is Now Enforced&lt;br /&gt;
| date = 2025/04/28&lt;br /&gt;
| message = In order to improve the scheduling and job throughput efficiency of ARC, interactive jobs are now limited to a maximum of 5 hours of runtime.  Interactive jobs that are submitted with a timelimit over 5 hours will be rejected at submission time.&lt;br /&gt;
&lt;br /&gt;
Apr 29, 2025&lt;br /&gt;
To increase the security posture of the Arc cluster administrators will be installing Trend Micro on cluster login nodes over the week starting Apr 30.  Please report any inconsistencies to support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Bulk Filesystem Emergency Maintenance&lt;br /&gt;
| date = 2025/07/29&lt;br /&gt;
| message = The filer that provides the /bulk filesystem will be down for emergency repairs at 12 Noon on Thursday July 31.  No access to files on /bulk will be possible for the duration of the multi-hour outage.  Any jobs running that access /bulk will start and then pause when access to /bulk is attempted.  Jobs should continue once service is restored.  Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Bulk Filesystem Maintenance Complete&lt;br /&gt;
| date = 2025/08/01&lt;br /&gt;
| message = Maintenance on /bulk was completed successfully and all filesystems are back in service on Arc.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU partition changes on ARC&lt;br /&gt;
| date = 2025/08/15&lt;br /&gt;
| message = New gpu-h100 and gpu-l40 partitions are available for general scheduling with new gpu hardware. gpu-v100 also has 6 fewer nodes. You can view more details about the node specs by running the arc.hardware script on the login node. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Update and Bulk outage on ARC&lt;br /&gt;
| date = 2025/09/15&lt;br /&gt;
| message = ⚠️ Arc Update September 15-19&lt;br /&gt;
&lt;br /&gt;
An update of the Arc cluster will begin on Sep 15.  This will result in fewer resources while the compute nodes are restarted.  The login node and Scheduler will be restarted on Sep 17.&lt;br /&gt;
&lt;br /&gt;
⚠️ Bulk Filesystem Maintenance September 17&lt;br /&gt;
The filer that provides the /bulk filesystem will be down for emergency repairs at 9 AM on Wednesday Sep 17th. No access to files on /bulk will be possible for the duration of the multi-hour outage. Any jobs running that access /bulk will start and then pause when access to /bulk is attempted. Jobs should continue once service is restored.  We apologize for any inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Update and Bulk outage on ARC&lt;br /&gt;
| date = 2025/09/18&lt;br /&gt;
| message = Arc update complete&lt;br /&gt;
&lt;br /&gt;
The update to arc is complete.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU partition changes on ARC&lt;br /&gt;
| date = 2025/09/23&lt;br /&gt;
| message = gpu-v100 has 6 more nodes. You can view more details about the node specs by running the arc.hardware script on the login node. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2025/11/4&lt;br /&gt;
| message =&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Legacy compute nodes are being retired&lt;br /&gt;
| date = 2025/11/20&lt;br /&gt;
| message = ARC nodes cn[0513-1096] are being removed from the arc cluster. They will be removed from scheduling and removed from the cluster over the next while. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Planned outage in Jan 2026&lt;br /&gt;
| date = 2025/12/26&lt;br /&gt;
| message = ⚠️ Arc Update Week of January 12&lt;br /&gt;
&lt;br /&gt;
ARC will be going down periodically during the week of 2026/Jan/12 to allow for system maintenance. All nodes and filesystems will be unavailable for use during the outage, and filesystem performance may be variable following the upgrade for a time until the changes finish. &lt;br /&gt;
&lt;br /&gt;
If you have questions or concerns, please contact support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Arc Filesystem Returned to service, scheduling outages continue&lt;br /&gt;
| date = 2025/01/13&lt;br /&gt;
| message = ⚠️ Data access now possible ⚠️&lt;br /&gt;
&lt;br /&gt;
The first stage of the maintenance is complete and the filesystems are once again available.  &lt;br /&gt;
&lt;br /&gt;
There is still significant work to do on the cluster so jobs will not start until this maintenance is complete but clients will be able to access data via the login node and DTN.&lt;br /&gt;
&lt;br /&gt;
If you have questions or concerns, please contact support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Arc Returned to service, scheduling resumed&lt;br /&gt;
| date = 2026/01/16&lt;br /&gt;
| message = Maintenance Complete&lt;br /&gt;
&lt;br /&gt;
Arc has been returned to service.&lt;br /&gt;
&lt;br /&gt;
If you have questions or concerns, please contact support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = New Compute nodes added to Arc&lt;br /&gt;
| date = 2026/01/22&lt;br /&gt;
| message = cpu2025 partition is now available.&lt;br /&gt;
&lt;br /&gt;
New nodes have been installed to replace the aging parallel nodes. They have 1TiB of memory and 128 cpus.&lt;br /&gt;
&lt;br /&gt;
If you have questions or concerns, please contact support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Outage Feb 23, 2026 Starting at 7AM&lt;br /&gt;
| date = 2026/02/17&lt;br /&gt;
| message = ⚠️ Filesystem Outage Affecting Arc, Marc, Talc, Cloudstack&lt;br /&gt;
&lt;br /&gt;
Filesystems will be unavailable Feb 23 starting at 7AM MST.  This outage is expected to end prior to 8PM.  No logins or file access will be possible until the outage is complete.&lt;br /&gt;
&lt;br /&gt;
Any jobs queued must request timelimits less then the remaining time until Feb 23, 7AM or they will wait until the outage is complete.&lt;br /&gt;
&lt;br /&gt;
Sorry for the inconvenience.  &lt;br /&gt;
&lt;br /&gt;
As of Friday Feb 20, 2026 - Due to some configuration issues on the network Fluent is not available on ARC at this time. &lt;br /&gt;
We are working to fix the issues and the software will return as soon as possible. Please try resubmitting any affected jobs later.&lt;br /&gt;
&lt;br /&gt;
If you have questions or concerns, please contact support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Navbox ARC}}&lt;br /&gt;
[[Category:ARC]]&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=4034</id>
		<title>ARC Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=4034"/>
		<updated>2026-02-20T19:09:34Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{ARC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ January System Updates&lt;br /&gt;
| date = 2023/01/01&lt;br /&gt;
| message =&lt;br /&gt;
Beginning January 16, 2023, the ARC cluster will undergo operating system updates. We shall do our utmost to minimize disruption and allow ongoing jobs to be completed. New jobs may be temporarily held from scheduling.&lt;br /&gt;
&lt;br /&gt;
The ARC login node will reboot on the morning of January 16. Please save your work and log out if possible.&lt;br /&gt;
&lt;br /&gt;
The upgrade is planned to be fully complete by January 20.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = System Updates Completed&lt;br /&gt;
| date = 2023/01/24&lt;br /&gt;
| message =&lt;br /&gt;
The upgrade has been completed. The following has been changed:&lt;br /&gt;
* OS Updated to Rocky Linux 8.7&lt;br /&gt;
* Slurm updated to 22.05.7&lt;br /&gt;
* Apptainer replaces Singularity&lt;br /&gt;
* Each job will have its own /tmp, /dev/shm, /run/user/$uid mounted&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/02/28&lt;br /&gt;
| message =&lt;br /&gt;
We are currently investigating a filesystem issue that is causing filesystem slowdowns across ARC.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/03/1&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns across ARC. Some jobs on ARC have been paused to help us find the root cause of the slowdowns.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ARC Login node reboot&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
The ARC login node will be rebooted this afternoon for an emergency maintenance. This downtime is needed to help mitigate the filesystem slowdowns experienced on the login node.  Jobs will continue running and scheduling during this time.&lt;br /&gt;
&lt;br /&gt;
All logins to the ARC login node will be terminated at 3:00PM and will remain unavailable until 4:00PM.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Filesystem Issues&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns on specific nodes in our MSRDC location.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues Resolved&lt;br /&gt;
| date = 2023/03/10&lt;br /&gt;
| message =&lt;br /&gt;
We have upgraded the filesystem routers in our MSRDC location to address the performance issues.&lt;br /&gt;
&lt;br /&gt;
Please let us know if you experience any issues with the filesystem performance.&lt;br /&gt;
&lt;br /&gt;
Thank-you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/05/01&lt;br /&gt;
| message =&lt;br /&gt;
On May 1, 2023, the ARC Open OnDemand node will be rebooted between 5PM and 6PM. Expected downtime will be approximately 15 minutes.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Apptainer (Singularity) on ARC Login Node&lt;br /&gt;
| date = 2023/06/22&lt;br /&gt;
| message =&lt;br /&gt;
Apptainer (Singularity) containers may experience an error when&lt;br /&gt;
running on the Arc login node. If apptainer complains that a system&lt;br /&gt;
administrator needs to enable user namespaces, simply run your&lt;br /&gt;
containers inside a job.&lt;br /&gt;
&lt;br /&gt;
This is a temporary measure due to security vulnerability that will be&lt;br /&gt;
patched soon.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Lattice, Single, cpu2013 partition changes&lt;br /&gt;
| date = 2023/07/13&lt;br /&gt;
| message =&lt;br /&gt;
The Lattice and Single, and cpu2013 have all been decomissioned.  The Single&lt;br /&gt;
partition will be replaced by the nodes formerly in the cpu2013 partition but&lt;br /&gt;
will be called single.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/10/17&lt;br /&gt;
| message =&lt;br /&gt;
Open OnDemand will be rebooted on October 17, 2023 for an update. It will be down for up to 30 minutes.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Storage Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2023/10/23&lt;br /&gt;
| message =&lt;br /&gt;
We will be performing storage upgrades on the MARC/ARC cluster on &lt;br /&gt;
November 16 and 17, 2023. To facilitate this, we will be throttling &lt;br /&gt;
down the number of jobs on both clusters while the upgrades are &lt;br /&gt;
performed&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/05/3&lt;br /&gt;
| message =&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Power Interruption&lt;br /&gt;
| date = 2024/05/07&lt;br /&gt;
| message = Arc Experienced an brief power outage around 11AM May 7, 2024.&lt;br /&gt;
Most compute nodes have or are rebooting.  Most jobs running at this time &lt;br /&gt;
were lost. Arc administrators are actively working on restarting compute &lt;br /&gt;
nodes. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation&lt;br /&gt;
| date = 2024/06/03&lt;br /&gt;
| message = Job submissions targeted to the  GPU a100 partition will be &lt;br /&gt;
affected by a temporary reservation on the nodes to accommodate the RCS&lt;br /&gt;
summer school class taking place on 2024/Jun/10. Reservation will end &lt;br /&gt;
shortly after. Please submit your jobs normally and the scheduler will &lt;br /&gt;
start them as soon as the nodes are available. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation Removed&lt;br /&gt;
| date = 2024/06/11&lt;br /&gt;
| message = GPU a100 Nodes in ARC have been returned to normal scheduling. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Notice of Upcoming Partial Outage&lt;br /&gt;
| date = 2024/08/23&lt;br /&gt;
| message = Several compute nodes from the ARC cluster will be unavailable &lt;br /&gt;
between Sept 23 to Sept 27 inclusive (subject to change). All compute nodes &lt;br /&gt;
in cpu2019, cpu2021/2, gpu-v100 most nodes from bigmem and gpu-a100 will be &lt;br /&gt;
affected. These nodes will return to service as soon as the work is complete.  &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update I&lt;br /&gt;
| date = 2024/09/25&lt;br /&gt;
| message = Due to hardware issues that is blocking our original maintenance window, most compute nodes that were taken offline on Monday has been brought back online today. An additional partial outage will occur again starting next Tuesday for the same nodes.&lt;br /&gt;
&lt;br /&gt;
On Tuesday, October 1, 2024, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until Friday October 4, 2024.&lt;br /&gt;
&lt;br /&gt;
We apologise for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update II&lt;br /&gt;
| date = 2024/10/04&lt;br /&gt;
| message = The maintenance window will be extended until at least Monday, October 7, 2024 due to a power distribution issue in our renovated data centre.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Monday, October 7, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update III&lt;br /&gt;
| date = 2024/10/07&lt;br /&gt;
| message = Due to technical issues beyond our control the maintenance window will be extended until at least Tuesday, October 15, 2024.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Tuesday, October 15, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Normal Scheduling has resumed. &lt;br /&gt;
| date = 2024/10/08&lt;br /&gt;
| message = The ARC cluster has been successfully brought online and nodes are running jobs normally. We apologize for the extended downtime. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance &lt;br /&gt;
| date = 2024/12/11&lt;br /&gt;
| message = The ARC login node will be rebooted on Tuesday December 17 for scheduled maintenance. It will be down for a few minutes and return shortly. Job scheduling and jobs running on the cluster will not be affected. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/07&lt;br /&gt;
| message = The ARC cluster will be rebooted for OS updates on Monday January 20, 2025. Please make sure to save your work and log out before the reboot happens. Scheduling will be paused until the cluster is back, but queued jobs will remain in the queue and nodes will start scheduling when the cluster is ready. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/15&lt;br /&gt;
| message = The ARC cluster will be down for maintenance and upgrades starting 9AM Monday, January 20, 2025 through Wednesday, January 22, 2025. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For the duration of the upgrade window:&lt;br /&gt;
* Scheduling will be paused and new jobs will be queued. Any queued jobs will start scheduling only after the upgrade is complete.&lt;br /&gt;
* Access to files via the login node and arc-dtn will generally be available but intermittent. File transfers on the DTN node, including Globus file transfers, may be interrupted during this window.&lt;br /&gt;
&lt;br /&gt;
Please make sure to save your work prior to this outage window to avoid any loss of work.&lt;br /&gt;
&lt;br /&gt;
During this time the following changes will happen:&lt;br /&gt;
&lt;br /&gt;
1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:&lt;br /&gt;
* cpu2023 (temporary)&lt;br /&gt;
* Parallel&lt;br /&gt;
* Theia/Synergy/cpu2017-bf05&lt;br /&gt;
* Single&lt;br /&gt;
Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.&lt;br /&gt;
&lt;br /&gt;
2. A component of the NetApp filer will be replaced. Access to /bulk will be unavailable on Wednesday, January 22, 2025.&lt;br /&gt;
&lt;br /&gt;
3. The compute node operating system will be updated to Rocky Linux 8.10.&lt;br /&gt;
&lt;br /&gt;
4. The Slurm scheduling system will be upgraded.&lt;br /&gt;
&lt;br /&gt;
5. The Open OnDemand web portal will be upgraded.&lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.calgary.ca with any issues or concerns.&lt;br /&gt;
&lt;br /&gt;
⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️&lt;br /&gt;
Update Jan 18, 2025&lt;br /&gt;
&lt;br /&gt;
Around 10AM Arc experienced an electrical power brownout.  Some percentage (how many is unknown at this time) of the nodes lost electrical power during this time causing a loss of a number of running jobs.  &lt;br /&gt;
&lt;br /&gt;
Sorry for the inconvenience.  &lt;br /&gt;
&lt;br /&gt;
Since Arc is shutting down for maintenance on Monday Jan 20, replacement jobs will likely not start unless they request a timelimit less than the time until 8AM Monday.  &lt;br /&gt;
⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Maintenance Complete&lt;br /&gt;
| date = 2025/01/22&lt;br /&gt;
| message = The ARC cluster upgrade is complete&lt;br /&gt;
&lt;br /&gt;
During this time the following changes happened:&lt;br /&gt;
&lt;br /&gt;
1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:&lt;br /&gt;
* cpu2023 (temporary)&lt;br /&gt;
* Parallel&lt;br /&gt;
* Theia/Synergy/cpu2017-bf05&lt;br /&gt;
* Single&lt;br /&gt;
Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.&lt;br /&gt;
&lt;br /&gt;
2. A component of the NetApp filer was replaced successfully.&lt;br /&gt;
&lt;br /&gt;
3. The compute node operating was updated to Rocky Linux 8.10.&lt;br /&gt;
&lt;br /&gt;
4. The Slurm scheduling system was upgraded.&lt;br /&gt;
&lt;br /&gt;
5. The Open OnDemand web portal was upgraded.&lt;br /&gt;
&lt;br /&gt;
6. The Parallel partition was renamed to Legacy to show the lack of an interconnect for parallel MPI work and was restricted to maximum 4 node jobs.&lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.calgary.ca with any issues or concerns.&lt;br /&gt;
&lt;br /&gt;
Jan 23, 9:08AM&lt;br /&gt;
&lt;br /&gt;
Remount complete.  arc is back in full service.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Module Command Upgrade&lt;br /&gt;
| date = 2025/02/03&lt;br /&gt;
| message = Upgrade of the module command&lt;br /&gt;
&lt;br /&gt;
On Tuesday, February 11, 2025 the module command will be upgraded to a new verson on Arc. This should result in new capabilities and a slightly different visual experience when using the module command.  Loading modules is not expected to change.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Module Command Upgraded&lt;br /&gt;
| date = 2025/02/12&lt;br /&gt;
| message = The module command was upgraded&lt;br /&gt;
&lt;br /&gt;
On Tuesday, February 12, 2025 the module command was upgraded to a new verson on Arc. This should result in new capabilities and a slightly different visual experience when using the module command.  Loading modules should not have changed.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address down&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca Unavailable&lt;br /&gt;
&lt;br /&gt;
Please be informed that our support email address (support@hpc.ucalgary.ca) for RCS is currently not working. We are working to bring it back as soon as possible. Please keep an eye on this space for updates. The clusters are working normally, but support will not receive your messages at this time. We will begin responding as soon as we can get it back. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address functional&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca is back&lt;br /&gt;
&lt;br /&gt;
support@hpc.ucalgary.ca has been repaired and RCS can be contacted there. If you had reached out for assistance in recent days without response please follow up as we may not have received your initial email. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Interactive Job Timelimit will be Enforced&lt;br /&gt;
| date = 2025/04/11&lt;br /&gt;
| message = In order to improve the scheduling and job throughput efficiency of ARC, interactive jobs will be limited to a maximum of 5 hours of runtime.  Interactive jobs that are submitted with a timelimit over 5 hours will be rejected at submission time.  This change will be made on Monday, April 28, 2025.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Interactive Job Timelimit Is Now Enforced&lt;br /&gt;
| date = 2025/04/28&lt;br /&gt;
| message = In order to improve the scheduling and job throughput efficiency of ARC, interactive jobs are now limited to a maximum of 5 hours of runtime.  Interactive jobs that are submitted with a timelimit over 5 hours will be rejected at submission time.&lt;br /&gt;
&lt;br /&gt;
Apr 29, 2025&lt;br /&gt;
To increase the security posture of the Arc cluster administrators will be installing Trend Micro on cluster login nodes over the week starting Apr 30.  Please report any inconsistencies to support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Bulk Filesystem Emergency Maintenance&lt;br /&gt;
| date = 2025/07/29&lt;br /&gt;
| message = The filer that provides the /bulk filesystem will be down for emergency repairs at 12 Noon on Thursday July 31.  No access to files on /bulk will be possible for the duration of the multi-hour outage.  Any jobs running that access /bulk will start and then pause when access to /bulk is attempted.  Jobs should continue once service is restored.  Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Bulk Filesystem Maintenance Complete&lt;br /&gt;
| date = 2025/08/01&lt;br /&gt;
| message = Maintenance on /bulk was completed successfully and all filesystems are back in service on Arc.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU partition changes on ARC&lt;br /&gt;
| date = 2025/08/15&lt;br /&gt;
| message = New gpu-h100 and gpu-l40 partitions are available for general scheduling with new gpu hardware. gpu-v100 also has 6 fewer nodes. You can view more details about the node specs by running the arc.hardware script on the login node. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Update and Bulk outage on ARC&lt;br /&gt;
| date = 2025/09/15&lt;br /&gt;
| message = ⚠️ Arc Update September 15-19&lt;br /&gt;
&lt;br /&gt;
An update of the Arc cluster will begin on Sep 15.  This will result in fewer resources while the compute nodes are restarted.  The login node and Scheduler will be restarted on Sep 17.&lt;br /&gt;
&lt;br /&gt;
⚠️ Bulk Filesystem Maintenance September 17&lt;br /&gt;
The filer that provides the /bulk filesystem will be down for emergency repairs at 9 AM on Wednesday Sep 17th. No access to files on /bulk will be possible for the duration of the multi-hour outage. Any jobs running that access /bulk will start and then pause when access to /bulk is attempted. Jobs should continue once service is restored.  We apologize for any inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Update and Bulk outage on ARC&lt;br /&gt;
| date = 2025/09/18&lt;br /&gt;
| message = Arc update complete&lt;br /&gt;
&lt;br /&gt;
The update to arc is complete.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU partition changes on ARC&lt;br /&gt;
| date = 2025/09/23&lt;br /&gt;
| message = gpu-v100 has 6 more nodes. You can view more details about the node specs by running the arc.hardware script on the login node. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2025/11/4&lt;br /&gt;
| message =&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Legacy compute nodes are being retired&lt;br /&gt;
| date = 2025/11/20&lt;br /&gt;
| message = ARC nodes cn[0513-1096] are being removed from the arc cluster. They will be removed from scheduling and removed from the cluster over the next while. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Planned outage in Jan 2026&lt;br /&gt;
| date = 2025/12/26&lt;br /&gt;
| message = ⚠️ Arc Update Week of January 12&lt;br /&gt;
&lt;br /&gt;
ARC will be going down periodically during the week of 2026/Jan/12 to allow for system maintenance. All nodes and filesystems will be unavailable for use during the outage, and filesystem performance may be variable following the upgrade for a time until the changes finish. &lt;br /&gt;
&lt;br /&gt;
If you have questions or concerns, please contact support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Arc Filesystem Returned to service, scheduling outages continue&lt;br /&gt;
| date = 2025/01/13&lt;br /&gt;
| message = ⚠️ Data access now possible ⚠️&lt;br /&gt;
&lt;br /&gt;
The first stage of the maintenance is complete and the filesystems are once again available.  &lt;br /&gt;
&lt;br /&gt;
There is still significant work to do on the cluster so jobs will not start until this maintenance is complete but clients will be able to access data via the login node and DTN.&lt;br /&gt;
&lt;br /&gt;
If you have questions or concerns, please contact support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Arc Returned to service, scheduling resumed&lt;br /&gt;
| date = 2026/01/16&lt;br /&gt;
| message = Maintenance Complete&lt;br /&gt;
&lt;br /&gt;
Arc has been returned to service.&lt;br /&gt;
&lt;br /&gt;
If you have questions or concerns, please contact support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = New Compute nodes added to Arc&lt;br /&gt;
| date = 2026/01/22&lt;br /&gt;
| message = cpu2025 partition is now available.&lt;br /&gt;
&lt;br /&gt;
New nodes have been installed to replace the aging parallel nodes. They have 1TiB of memory and 128 cpus.&lt;br /&gt;
&lt;br /&gt;
If you have questions or concerns, please contact support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Outage Feb 23, 2026 Starting at 7AM&lt;br /&gt;
| date = 2026/02/17&lt;br /&gt;
| message = ⚠️ Filesystem Outage Affecting Arc, Marc, Talc, Cloudstack&lt;br /&gt;
&lt;br /&gt;
Filesystems will be unavailable Feb 23 starting at 7AM MST.  This outage is expected to end prior to 8PM.  No logins or file access will be possible until the outage is complete.&lt;br /&gt;
&lt;br /&gt;
Any jobs queued must request timelimits less then the remaining time until Feb 23, 7AM or they will wait until the outage is complete.&lt;br /&gt;
&lt;br /&gt;
Sorry for the inconvenience.  &lt;br /&gt;
&lt;br /&gt;
If you have questions or concerns, please contact support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Ansys Fluent is currently unavailable on ARC&lt;br /&gt;
| date = 2026/02/17&lt;br /&gt;
| message = Due to some configuration issues on the network Fluent is not available on ARC at this time. &lt;br /&gt;
We are working to fix the issues and the software will return as soon as possible. Please try resubmitting any affected jobs later.&lt;br /&gt;
&lt;br /&gt;
If you have questions or concerns, please contact support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Navbox ARC}}&lt;br /&gt;
[[Category:ARC]]&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=Altis_Login_Node_Status&amp;diff=3959</id>
		<title>Altis Login Node Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=Altis_Login_Node_Status&amp;diff=3959"/>
		<updated>2025-12-16T19:46:34Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{ARC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/09/03&lt;br /&gt;
| message =&lt;br /&gt;
The ARC Cluster and the Altis login node is operational. No upcoming upgrades are planned.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Notice of Upcoming Partial Outage&lt;br /&gt;
| date = 2024/08/27&lt;br /&gt;
| message = Several compute nodes from the ARC cluster will be unavailable &lt;br /&gt;
between Sept 23 to Sept 27 inclusive (subject to change). Some Altis GPU nodes will be affected during this maintenance window. These nodes will return to service as soon as the work is complete.  &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update I&lt;br /&gt;
| date = 2024/09/25&lt;br /&gt;
| message = Due to hardware issues that is blocking our original maintenance window, most compute nodes that were taken offline on Monday has been brought back online today. An additional partial outage will occur again starting next Tuesday for the same nodes.&lt;br /&gt;
&lt;br /&gt;
On Tuesday, October 1, 2024, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until Friday October 4, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologise for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update II&lt;br /&gt;
| date = 2024/10/04&lt;br /&gt;
| message = The maintenance window will be extended until at least Monday, October 7, 2024 due to a power distribution issue in our renovated data centre.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Monday, October 7, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update III&lt;br /&gt;
| date = 2024/10/07&lt;br /&gt;
| message = Due to technical issues beyond our control the maintenance window will be extended until at least Tuesday, October 15, 2024.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Tuesday, October 15, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Normal Scheduling has resumed. &lt;br /&gt;
| date = 2024/10/08&lt;br /&gt;
| message = The ARC cluster has been successfully brought online and nodes are running jobs normally. We apologize for the extended downtime. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = wdfgpu[1-12] System Update Reboots &lt;br /&gt;
| date = 2024/12/02&lt;br /&gt;
| message = wdfgpu[1-12] will be updated today for a short reboot to install important system updates and will return shortly. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/07&lt;br /&gt;
| message = The ARC cluster will be rebooted for OS updates on Monday January 20, 2025. Please make sure to save your work and log out before the reboot happens. Scheduling will be paused until the cluster is back, but queued jobs will remain in the queue and nodes will start scheduling when the cluster is ready. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/15&lt;br /&gt;
| message = The ARC cluster will be down for maintenance and upgrades starting 9AM Monday, January 20, 2025 through Wednesday, January 22, 2025. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For the duration of the upgrade window:&lt;br /&gt;
* Scheduling will be paused and new jobs will be queued. Any queued jobs will start scheduling only after the upgrade is complete.&lt;br /&gt;
* Access to files via the login node and arc-dtn will generally be available but intermittent. File transfers on the DTN node, including Globus file transfers, may be interrupted during this window.&lt;br /&gt;
&lt;br /&gt;
Please make sure to save your work prior to this outage window to avoid any loss of work.&lt;br /&gt;
&lt;br /&gt;
During this time the following changes will happen:&lt;br /&gt;
&lt;br /&gt;
1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:&lt;br /&gt;
* cpu2023 (temporary)&lt;br /&gt;
* Parallel&lt;br /&gt;
* Theia/Synergy/cpu2017-bf05&lt;br /&gt;
* Single&lt;br /&gt;
Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.&lt;br /&gt;
&lt;br /&gt;
2. A component of the NetApp filer will be replaced. Access to /bulk will be unavailable on Wednesday, January 22, 2025.&lt;br /&gt;
&lt;br /&gt;
3. The compute node operating system will be updated to Rocky Linux 8.10.&lt;br /&gt;
&lt;br /&gt;
4. The Slurm scheduling system will be upgraded.&lt;br /&gt;
&lt;br /&gt;
5. The Open OnDemand web portal will be upgraded.&lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.calgary.ca with any issues or concerns.&lt;br /&gt;
&lt;br /&gt;
⚠️⚠️⚠️⚠️⚠️&lt;br /&gt;
Update Jan 18, 2025&lt;br /&gt;
&lt;br /&gt;
Around 10AM Altis experienced an electrical power brownout.  Some percentage (how many is unknown at this time) of the nodes lost electrical power during this time causing a loss &lt;br /&gt;
of a number of running jobs.  &lt;br /&gt;
&lt;br /&gt;
Sorry for the inconvenience.  &lt;br /&gt;
&lt;br /&gt;
Since Altis is shutting down for maintenance on Monday Jan 20, replacement jobs will likely not start unless they request a timelimit less than the time until 8AM Monday.  &lt;br /&gt;
⚠️⚠️⚠️⚠️⚠️&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Maintenance Complete&lt;br /&gt;
| date = 2025/01/22&lt;br /&gt;
| message = The ARC/Altis cluster upgrade is complete&lt;br /&gt;
&lt;br /&gt;
During this time the following changes happened:&lt;br /&gt;
&lt;br /&gt;
1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:&lt;br /&gt;
* cpu2023 (temporary)&lt;br /&gt;
* Parallel&lt;br /&gt;
* Theia/Synergy/cpu2017-bf05&lt;br /&gt;
* Single&lt;br /&gt;
Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.&lt;br /&gt;
&lt;br /&gt;
2. A component of the NetApp filer was replaced successfully.&lt;br /&gt;
&lt;br /&gt;
3. The compute node operating was updated to Rocky Linux 8.10.&lt;br /&gt;
&lt;br /&gt;
4. The Slurm scheduling system was upgraded.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.calgary.ca with any issues or concerns.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address down&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca Unavailable&lt;br /&gt;
&lt;br /&gt;
Please be informed that our support email address (support@hpc.ucalgary.ca) for RCS is currently not working. We are working to bring it back as soon as possible. Please keep an eye on this space for updates. The clusters are working normally, but support will not receive your messages at this time. We will begin responding as soon as we can get it back. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address functional&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca is back&lt;br /&gt;
&lt;br /&gt;
support@hpc.ucalgary.ca has been repaired and RCS can be contacted there. If you had reached out for assistance in recent days without response please follow up as we may not have received your initial email. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Trend Micro Installation&lt;br /&gt;
| date = 2025/04/29&lt;br /&gt;
| message = To increase the security posture of the Arc cluster administrators will be installing Trend Micro on cluster login nodes over the week starting Apr 30.  Please report any inconsistencies to support@hpc.ucalgary.ca&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Update on ARC&lt;br /&gt;
| date = 2025/09/15&lt;br /&gt;
| message = ⚠️ Arc Update September 15-19&lt;br /&gt;
&lt;br /&gt;
An update of the Arc cluster will begin on Sep 15.  This will result in fewer resources while the compute nodes are restarted.  The login node and Scheduler will be restarted on Sep 17.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Update on Altis&lt;br /&gt;
| date = 2025/09/15&lt;br /&gt;
| message = ⚠️ Altis Update September 15-19&lt;br /&gt;
&lt;br /&gt;
An update of the Altis (Arc) cluster will begin on Sep 15.  This will result in fewer resources while the compute nodes are restarted.  The login node and Scheduler will be restarted on Sep 17.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Update on Altis&lt;br /&gt;
| date = 2025/09/15&lt;br /&gt;
| message = Update complete&lt;br /&gt;
&lt;br /&gt;
The update of the Arc cluster is complete.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Planned outage in Jan 2026&lt;br /&gt;
| date = 2025/12/26&lt;br /&gt;
| message = ⚠️ ARC Update Week of January 12&lt;br /&gt;
&lt;br /&gt;
ARC will be going down periodically during the week of 2026/Jan/12 to allow for system maintenance. All nodes and filesystems will be unavailable for use during the outage, and filesystem performance may be variable following the upgrade for a time until the changes finish. &lt;br /&gt;
&lt;br /&gt;
If you have questions or concerns, please contact support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
[[Category:ARC]]&lt;br /&gt;
{{Navbox ARC}}&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=Think_Login_Node_Status&amp;diff=3958</id>
		<title>Think Login Node Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=Think_Login_Node_Status&amp;diff=3958"/>
		<updated>2025-12-16T19:46:18Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{ARC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/09/03&lt;br /&gt;
| message =&lt;br /&gt;
The ARC Cluster and the Think login node is operational. No upcoming upgrades are planned.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Notice of Upcoming Partial Outage&lt;br /&gt;
| date = 2024/08/23&lt;br /&gt;
| message = Several compute nodes from the ARC cluster will be unavailable &lt;br /&gt;
between Sept 23 to Sept 27 inclusive (subject to change). Some Think GPU nodes will be affected during this maintenance window. These nodes will return to service as soon as the work is complete.  &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update I&lt;br /&gt;
| date = 2024/09/25&lt;br /&gt;
| message = Due to hardware issues that is blocking our original maintenance window, most compute nodes that were taken offline on Monday has been brought back online today. An additional partial outage will occur again starting next Tuesday for the same nodes.&lt;br /&gt;
&lt;br /&gt;
On Tuesday, October 1, 2024, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until Friday October 4, 2024.&lt;br /&gt;
&lt;br /&gt;
We apologise for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update II&lt;br /&gt;
| date = 2024/10/04&lt;br /&gt;
| message = The maintenance window will be extended until at least Monday, October 7, 2024 due to a power distribution issue in our renovated data centre.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Monday, October 7, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update III&lt;br /&gt;
| date = 2024/10/07&lt;br /&gt;
| message = Due to technical issues beyond our control the maintenance window will be extended until at least Tuesday, October 15, 2024.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Tuesday, October 15, 2024. Affected WDF GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Normal Scheduling has resumed. &lt;br /&gt;
| date = 2024/10/08&lt;br /&gt;
| message = The ARC cluster has been successfully brought online and nodes are running jobs normally. We apologize for the extended downtime. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = wdfgpu[1-12] System Update Reboots &lt;br /&gt;
| date = 2024/12/02&lt;br /&gt;
| message = wdfgpu[1-12] will be updated today for a short reboot to install important system updates and will return shortly. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/07&lt;br /&gt;
| message = The ARC cluster will be rebooted for OS updates on Monday January 20, 2025. Please make sure to save your work and log out before the reboot happens. Scheduling will be paused until the cluster is back, but queued jobs will remain in the queue and nodes will start scheduling when the cluster is ready. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/15&lt;br /&gt;
| message = The ARC cluster will be down for maintenance and upgrades starting 9AM Monday, January 20, 2025 through Wednesday, January 22, 2025. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For the duration of the upgrade window:&lt;br /&gt;
* Scheduling will be paused and new jobs will be queued. Any queued jobs will start scheduling only after the upgrade is complete.&lt;br /&gt;
* Access to files via the login node and arc-dtn will generally be available but intermittent. File transfers on the DTN node, including Globus file transfers, may be interrupted during this window.&lt;br /&gt;
&lt;br /&gt;
Please make sure to save your work prior to this outage window to avoid any loss of work.&lt;br /&gt;
&lt;br /&gt;
During this time the following changes will happen:&lt;br /&gt;
&lt;br /&gt;
1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:&lt;br /&gt;
* cpu2023 (temporary)&lt;br /&gt;
* Parallel&lt;br /&gt;
* Theia/Synergy/cpu2017-bf05&lt;br /&gt;
* Single&lt;br /&gt;
Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.&lt;br /&gt;
&lt;br /&gt;
2. A component of the NetApp filer will be replaced. Access to /bulk will be unavailable on Wednesday, January 22, 2025.&lt;br /&gt;
&lt;br /&gt;
3. The compute node operating system will be updated to Rocky Linux 8.10.&lt;br /&gt;
&lt;br /&gt;
4. The Slurm scheduling system will be upgraded.&lt;br /&gt;
&lt;br /&gt;
5. The Open OnDemand web portal will be upgraded.&lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.calgary.ca with any issues or concerns.&lt;br /&gt;
&lt;br /&gt;
⚠️⚠️⚠️⚠️⚠️&lt;br /&gt;
Update Jan 18, 2025&lt;br /&gt;
&lt;br /&gt;
Around 10AM Arc experienced an electrical power brownout.  Some percentage (how many is unknown at this time) of the nodes lost electrical power during this time causing a loss &lt;br /&gt;
of a number of running jobs.  &lt;br /&gt;
&lt;br /&gt;
Sorry for the inconvenience.  &lt;br /&gt;
&lt;br /&gt;
Since Arc is shutting down for maintenance on Monday Jan 20, replacement jobs will likely not start unless they request a timelimit less than the time until 8AM Monday.  &lt;br /&gt;
⚠️⚠️⚠️⚠️⚠️&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Maintenance Complete&lt;br /&gt;
| date = 2025/01/22&lt;br /&gt;
| message = The ARC/Think cluster upgrade is complete&lt;br /&gt;
&lt;br /&gt;
During this time the following changes happened:&lt;br /&gt;
&lt;br /&gt;
1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:&lt;br /&gt;
* cpu2023 (temporary)&lt;br /&gt;
* Parallel&lt;br /&gt;
* Theia/Synergy/cpu2017-bf05&lt;br /&gt;
* Single&lt;br /&gt;
Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.&lt;br /&gt;
&lt;br /&gt;
2. A component of the NetApp filer was replaced successfully.&lt;br /&gt;
&lt;br /&gt;
3. The compute node operating was updated to Rocky Linux 8.10.&lt;br /&gt;
&lt;br /&gt;
4. The Slurm scheduling system was upgraded.&lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.calgary.ca with any issues or concerns.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address down&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca Unavailable&lt;br /&gt;
&lt;br /&gt;
Please be informed that our support email address (support@hpc.ucalgary.ca) for RCS is currently not working. We are working to bring it back as soon as possible. Please keep an eye on this space for updates. The clusters are working normally, but support will not receive your messages at this time. We will begin responding as soon as we can get it back. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address functional&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca is back&lt;br /&gt;
&lt;br /&gt;
support@hpc.ucalgary.ca has been repaired and RCS can be contacted there. If you had reached out for assistance in recent days without response please follow up as we may not have received your initial email. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Trend Micro Installation&lt;br /&gt;
| date = 2025/04/29&lt;br /&gt;
| message = To increase the security posture of the Arc cluster administrators will be installing Trend Micro on cluster login nodes over the week starting Apr 30.  Please report any inconsistencies to support@hpc.ucalgary.ca&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Planned outage in Jan 2026&lt;br /&gt;
| date = 2025/12/26&lt;br /&gt;
| message = ⚠️ ARC Update Week of January 12&lt;br /&gt;
&lt;br /&gt;
ARC will be going down periodically during the week of 2026/Jan/12 to allow for system maintenance. All nodes and filesystems will be unavailable for use during the outage, and filesystem performance may be variable following the upgrade for a time until the changes finish. &lt;br /&gt;
&lt;br /&gt;
If you have questions or concerns, please contact support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
[[Category:ARC]]&lt;br /&gt;
{{Navbox ARC}}&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=MARC_Cluster_Status&amp;diff=3957</id>
		<title>MARC Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=MARC_Cluster_Status&amp;diff=3957"/>
		<updated>2025-12-16T19:45:22Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Cluster Status&lt;br /&gt;
| cluster = MARC&lt;br /&gt;
| status = green&lt;br /&gt;
| title = Cluster operational&lt;br /&gt;
| message = See the [[MARC Cluster Status]] page for system notices. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ January System Updates&lt;br /&gt;
| date = 2023/01/01&lt;br /&gt;
| message =&lt;br /&gt;
Beginning January 23, 2023, the MARC cluster will undergo operating system updates. We shall do our utmost to minimize disruption and allow ongoing jobs to be completed. New jobs may be temporarily held from scheduling.&lt;br /&gt;
&lt;br /&gt;
The MARC login node will reboot on the morning of January 23. Please save your work and log out if possible.&lt;br /&gt;
&lt;br /&gt;
The upgrade is planned to be fully complete by January 27.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = System Updates Completed&lt;br /&gt;
| date = 2023/01/24&lt;br /&gt;
| message =&lt;br /&gt;
The upgrade has been completed. The following has been changed:&lt;br /&gt;
* OS Updated to Rocky Linux 8.7&lt;br /&gt;
* Slurm updated to 22.05.7&lt;br /&gt;
* Apptainer replaces Singularity&lt;br /&gt;
* Each job will have its own /tmp, /dev/shm, /run/user/$uid mounted&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Apptainer (Singularity) on MARC Login Node&lt;br /&gt;
| date = 2023/06/23&lt;br /&gt;
| message =&lt;br /&gt;
Apptainer (Singularity) containers may experience an error when&lt;br /&gt;
running on the MARC login node. If apptainer complains that a system&lt;br /&gt;
administrator needs to enable user namespaces, simply run your&lt;br /&gt;
containers inside a job.&lt;br /&gt;
&lt;br /&gt;
This is a temporary measure due to security vulnerability that will be&lt;br /&gt;
patched soon.&lt;br /&gt;
}}besian.sejdiu&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Storage Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2023/10/23&lt;br /&gt;
| message =&lt;br /&gt;
We will be performing storage upgrades on the MARC/ARC cluster on &lt;br /&gt;
November 16 and 17, 2023. To facilitate this, we will be throttling &lt;br /&gt;
down the number of jobs on both clusters while the upgrades are &lt;br /&gt;
performed&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = OS Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2024/09/11&lt;br /&gt;
| message =&lt;br /&gt;
MARC will be going down for OS upgrades on 2024/Sep/16. The cluster &lt;br /&gt;
will be unavailable temporarily to complete this work. Please contact&lt;br /&gt;
support@hpc.ucalgary.ca if you have any questions or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance &lt;br /&gt;
| date = 2024/12/11&lt;br /&gt;
| message = The MARC login node will be rebooted on Tuesday December 17 for scheduled maintenance. It will be down for a few minutes and return shortly. Job scheduling and jobs running on the cluster will not be affected. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/07&lt;br /&gt;
| message = The MARC cluster will be rebooted for OS updates on Monday January 20, 2025. Please make sure to save your work and log out before the reboot happens. Scheduling will be paused until the cluster is back, but queued jobs will remain in the queue and nodes will start scheduling when the cluster is ready. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address down&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca Unavailable&lt;br /&gt;
&lt;br /&gt;
Please be informed that our support email address (support@hpc.ucalgary.ca) for RCS is currently not working. We are working to bring it back as soon as possible. Please keep an eye on this space for updates. The clusters are working normally, but support will not receive your messages at this time. We will begin responding as soon as we can get it back. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address functional&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca is back&lt;br /&gt;
&lt;br /&gt;
support@hpc.ucalgary.ca has been repaired and RCS can be contacted there. If you had reached out for assistance in recent days without response please follow up as we may not have received your initial email. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = MARC Scheduled File System Maintenance&lt;br /&gt;
| date = 2025/06/09&lt;br /&gt;
| message = Please be advised MARC will be going down for a period of approximately 2 hours starting at 10 AM June 17, 2025. Logins will not be available and no jobs will be running during this window. &lt;br /&gt;
&lt;br /&gt;
Please send any questions or concerns to support@hpc.ucalgary.ca. Apologies for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = MARC Maintenance Complete&lt;br /&gt;
| date = 2025/06/17&lt;br /&gt;
| message = Filesystem maintenance complete.&lt;br /&gt;
&lt;br /&gt;
Please send any questions or concerns to support@hpc.ucalgary.ca.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Planned outage in Jan 2026&lt;br /&gt;
| date = 2025/12/26&lt;br /&gt;
| message = ⚠️ MARC Update Week of January 12&lt;br /&gt;
&lt;br /&gt;
MARC will be going down periodically during the week of 2026/Jan/12 to allow for system maintenance. All nodes and filesystems will be unavailable for use during the outage, and filesystem performance may be variable following the upgrade for a time until the changes finish. &lt;br /&gt;
&lt;br /&gt;
If you have questions or concerns, please contact support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=TALC_Cluster_Status&amp;diff=3956</id>
		<title>TALC Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=TALC_Cluster_Status&amp;diff=3956"/>
		<updated>2025-12-16T19:45:10Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{TALC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ May System Updates&lt;br /&gt;
| date = 2023/02/02&lt;br /&gt;
| message =&lt;br /&gt;
Beginning May 1, 2023, the TALC cluster will undergo operating system updates. The upgrade will happen after the end of term to minimize any disruption. Any existing jobs may be &lt;br /&gt;
temporarily held from scheduling. The upgrade is planned to be fully complete by May 5.&lt;br /&gt;
&lt;br /&gt;
The TALC login node will reboot on the morning of May 1.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = May System Updates Completed&lt;br /&gt;
| date = 2023/05/04&lt;br /&gt;
| message =&lt;br /&gt;
TALC upgrades have been completed. If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Power Interruption&lt;br /&gt;
| date = 2024/05/07&lt;br /&gt;
| message = TALC Experienced an brief power outage around 11AM May 7, 2024.&lt;br /&gt;
Most compute nodes have or are rebooting.  Most jobs running at this time &lt;br /&gt;
were lost. Administrators are actively working on restarting compute &lt;br /&gt;
nodes. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/06/26&lt;br /&gt;
| message = &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address down&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca Unavailable&lt;br /&gt;
&lt;br /&gt;
Please be informed that our support email address (support@hpc.ucalgary.ca) for RCS is currently not working. We are working to bring it back as soon as possible. Please keep an eye on this space for updates. The clusters are working normally, but support will not receive your messages at this time. We will begin responding as soon as we can get it back. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address functional&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca is back&lt;br /&gt;
&lt;br /&gt;
support@hpc.ucalgary.ca has been repaired and RCS can be contacted there. If you had reached out for assistance in recent days without response please follow up as we may not have received your initial email. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Trend Micro Installation&lt;br /&gt;
| date = 2025/04/29&lt;br /&gt;
| message = To increase the security posture of the Arc cluster administrators will be installing Trend Micro on cluster login nodes over the week starting Apr 30.  Please report any inconsistencies to support@hpc.ucalgary.ca&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Planned outage in Jan 2026&lt;br /&gt;
| date = 2025/12/26&lt;br /&gt;
| message = ⚠️ TALC Update Week of January 12&lt;br /&gt;
&lt;br /&gt;
TALC will be going down periodically during the week of 2026/Jan/12 to allow for system maintenance. All nodes and filesystems will be unavailable for use during the outage, and filesystem performance may be variable following the upgrade for a time until the changes finish. &lt;br /&gt;
&lt;br /&gt;
If you have questions or concerns, please contact support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
[[Category:TALC]]&lt;br /&gt;
{{Navbox TALC}}&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=MARC_Cluster_Status&amp;diff=3955</id>
		<title>MARC Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=MARC_Cluster_Status&amp;diff=3955"/>
		<updated>2025-12-16T19:44:36Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Cluster Status&lt;br /&gt;
| cluster = MARC&lt;br /&gt;
| status = green&lt;br /&gt;
| title = Cluster operational&lt;br /&gt;
| message = See the [[MARC Cluster Status]] page for system notices. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ January System Updates&lt;br /&gt;
| date = 2023/01/01&lt;br /&gt;
| message =&lt;br /&gt;
Beginning January 23, 2023, the MARC cluster will undergo operating system updates. We shall do our utmost to minimize disruption and allow ongoing jobs to be completed. New jobs may be temporarily held from scheduling.&lt;br /&gt;
&lt;br /&gt;
The MARC login node will reboot on the morning of January 23. Please save your work and log out if possible.&lt;br /&gt;
&lt;br /&gt;
The upgrade is planned to be fully complete by January 27.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = System Updates Completed&lt;br /&gt;
| date = 2023/01/24&lt;br /&gt;
| message =&lt;br /&gt;
The upgrade has been completed. The following has been changed:&lt;br /&gt;
* OS Updated to Rocky Linux 8.7&lt;br /&gt;
* Slurm updated to 22.05.7&lt;br /&gt;
* Apptainer replaces Singularity&lt;br /&gt;
* Each job will have its own /tmp, /dev/shm, /run/user/$uid mounted&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Apptainer (Singularity) on MARC Login Node&lt;br /&gt;
| date = 2023/06/23&lt;br /&gt;
| message =&lt;br /&gt;
Apptainer (Singularity) containers may experience an error when&lt;br /&gt;
running on the MARC login node. If apptainer complains that a system&lt;br /&gt;
administrator needs to enable user namespaces, simply run your&lt;br /&gt;
containers inside a job.&lt;br /&gt;
&lt;br /&gt;
This is a temporary measure due to security vulnerability that will be&lt;br /&gt;
patched soon.&lt;br /&gt;
}}besian.sejdiu&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Storage Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2023/10/23&lt;br /&gt;
| message =&lt;br /&gt;
We will be performing storage upgrades on the MARC/ARC cluster on &lt;br /&gt;
November 16 and 17, 2023. To facilitate this, we will be throttling &lt;br /&gt;
down the number of jobs on both clusters while the upgrades are &lt;br /&gt;
performed&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = OS Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2024/09/11&lt;br /&gt;
| message =&lt;br /&gt;
MARC will be going down for OS upgrades on 2024/Sep/16. The cluster &lt;br /&gt;
will be unavailable temporarily to complete this work. Please contact&lt;br /&gt;
support@hpc.ucalgary.ca if you have any questions or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance &lt;br /&gt;
| date = 2024/12/11&lt;br /&gt;
| message = The MARC login node will be rebooted on Tuesday December 17 for scheduled maintenance. It will be down for a few minutes and return shortly. Job scheduling and jobs running on the cluster will not be affected. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/07&lt;br /&gt;
| message = The MARC cluster will be rebooted for OS updates on Monday January 20, 2025. Please make sure to save your work and log out before the reboot happens. Scheduling will be paused until the cluster is back, but queued jobs will remain in the queue and nodes will start scheduling when the cluster is ready. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address down&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca Unavailable&lt;br /&gt;
&lt;br /&gt;
Please be informed that our support email address (support@hpc.ucalgary.ca) for RCS is currently not working. We are working to bring it back as soon as possible. Please keep an eye on this space for updates. The clusters are working normally, but support will not receive your messages at this time. We will begin responding as soon as we can get it back. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address functional&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca is back&lt;br /&gt;
&lt;br /&gt;
support@hpc.ucalgary.ca has been repaired and RCS can be contacted there. If you had reached out for assistance in recent days without response please follow up as we may not have received your initial email. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = MARC Scheduled File System Maintenance&lt;br /&gt;
| date = 2025/06/09&lt;br /&gt;
| message = Please be advised MARC will be going down for a period of approximately 2 hours starting at 10 AM June 17, 2025. Logins will not be available and no jobs will be running during this window. &lt;br /&gt;
&lt;br /&gt;
Please send any questions or concerns to support@hpc.ucalgary.ca. Apologies for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = MARC Maintenance Complete&lt;br /&gt;
| date = 2025/06/17&lt;br /&gt;
| message = Filesystem maintenance complete.&lt;br /&gt;
&lt;br /&gt;
Please send any questions or concerns to support@hpc.ucalgary.ca.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Planned outage in Jan 2026&lt;br /&gt;
| date = 2025/12/26&lt;br /&gt;
| message = ⚠️ Arc Update Week of January 12&lt;br /&gt;
&lt;br /&gt;
MARC will be going down periodically during the week of 2026/Jan/12 to allow for system maintenance. All nodes and filesystems will be unavailable for use during the outage, and filesystem performance may be variable following the upgrade for a time until the changes finish. &lt;br /&gt;
&lt;br /&gt;
If you have questions or concerns, please contact support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=3954</id>
		<title>ARC Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=3954"/>
		<updated>2025-12-16T19:44:01Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{ARC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ January System Updates&lt;br /&gt;
| date = 2023/01/01&lt;br /&gt;
| message =&lt;br /&gt;
Beginning January 16, 2023, the ARC cluster will undergo operating system updates. We shall do our utmost to minimize disruption and allow ongoing jobs to be completed. New jobs may be temporarily held from scheduling.&lt;br /&gt;
&lt;br /&gt;
The ARC login node will reboot on the morning of January 16. Please save your work and log out if possible.&lt;br /&gt;
&lt;br /&gt;
The upgrade is planned to be fully complete by January 20.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = System Updates Completed&lt;br /&gt;
| date = 2023/01/24&lt;br /&gt;
| message =&lt;br /&gt;
The upgrade has been completed. The following has been changed:&lt;br /&gt;
* OS Updated to Rocky Linux 8.7&lt;br /&gt;
* Slurm updated to 22.05.7&lt;br /&gt;
* Apptainer replaces Singularity&lt;br /&gt;
* Each job will have its own /tmp, /dev/shm, /run/user/$uid mounted&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/02/28&lt;br /&gt;
| message =&lt;br /&gt;
We are currently investigating a filesystem issue that is causing filesystem slowdowns across ARC.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/03/1&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns across ARC. Some jobs on ARC have been paused to help us find the root cause of the slowdowns.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ARC Login node reboot&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
The ARC login node will be rebooted this afternoon for an emergency maintenance. This downtime is needed to help mitigate the filesystem slowdowns experienced on the login node.  Jobs will continue running and scheduling during this time.&lt;br /&gt;
&lt;br /&gt;
All logins to the ARC login node will be terminated at 3:00PM and will remain unavailable until 4:00PM.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Filesystem Issues&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns on specific nodes in our MSRDC location.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues Resolved&lt;br /&gt;
| date = 2023/03/10&lt;br /&gt;
| message =&lt;br /&gt;
We have upgraded the filesystem routers in our MSRDC location to address the performance issues.&lt;br /&gt;
&lt;br /&gt;
Please let us know if you experience any issues with the filesystem performance.&lt;br /&gt;
&lt;br /&gt;
Thank-you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/05/01&lt;br /&gt;
| message =&lt;br /&gt;
On May 1, 2023, the ARC Open OnDemand node will be rebooted between 5PM and 6PM. Expected downtime will be approximately 15 minutes.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Apptainer (Singularity) on ARC Login Node&lt;br /&gt;
| date = 2023/06/22&lt;br /&gt;
| message =&lt;br /&gt;
Apptainer (Singularity) containers may experience an error when&lt;br /&gt;
running on the Arc login node. If apptainer complains that a system&lt;br /&gt;
administrator needs to enable user namespaces, simply run your&lt;br /&gt;
containers inside a job.&lt;br /&gt;
&lt;br /&gt;
This is a temporary measure due to security vulnerability that will be&lt;br /&gt;
patched soon.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Lattice, Single, cpu2013 partition changes&lt;br /&gt;
| date = 2023/07/13&lt;br /&gt;
| message =&lt;br /&gt;
The Lattice and Single, and cpu2013 have all been decomissioned.  The Single&lt;br /&gt;
partition will be replaced by the nodes formerly in the cpu2013 partition but&lt;br /&gt;
will be called single.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/10/17&lt;br /&gt;
| message =&lt;br /&gt;
Open OnDemand will be rebooted on October 17, 2023 for an update. It will be down for up to 30 minutes.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Storage Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2023/10/23&lt;br /&gt;
| message =&lt;br /&gt;
We will be performing storage upgrades on the MARC/ARC cluster on &lt;br /&gt;
November 16 and 17, 2023. To facilitate this, we will be throttling &lt;br /&gt;
down the number of jobs on both clusters while the upgrades are &lt;br /&gt;
performed&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/05/3&lt;br /&gt;
| message =&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Power Interruption&lt;br /&gt;
| date = 2024/05/07&lt;br /&gt;
| message = Arc Experienced an brief power outage around 11AM May 7, 2024.&lt;br /&gt;
Most compute nodes have or are rebooting.  Most jobs running at this time &lt;br /&gt;
were lost. Arc administrators are actively working on restarting compute &lt;br /&gt;
nodes. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation&lt;br /&gt;
| date = 2024/06/03&lt;br /&gt;
| message = Job submissions targeted to the  GPU a100 partition will be &lt;br /&gt;
affected by a temporary reservation on the nodes to accommodate the RCS&lt;br /&gt;
summer school class taking place on 2024/Jun/10. Reservation will end &lt;br /&gt;
shortly after. Please submit your jobs normally and the scheduler will &lt;br /&gt;
start them as soon as the nodes are available. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation Removed&lt;br /&gt;
| date = 2024/06/11&lt;br /&gt;
| message = GPU a100 Nodes in ARC have been returned to normal scheduling. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Notice of Upcoming Partial Outage&lt;br /&gt;
| date = 2024/08/23&lt;br /&gt;
| message = Several compute nodes from the ARC cluster will be unavailable &lt;br /&gt;
between Sept 23 to Sept 27 inclusive (subject to change). All compute nodes &lt;br /&gt;
in cpu2019, cpu2021/2, gpu-v100 most nodes from bigmem and gpu-a100 will be &lt;br /&gt;
affected. These nodes will return to service as soon as the work is complete.  &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update I&lt;br /&gt;
| date = 2024/09/25&lt;br /&gt;
| message = Due to hardware issues that is blocking our original maintenance window, most compute nodes that were taken offline on Monday has been brought back online today. An additional partial outage will occur again starting next Tuesday for the same nodes.&lt;br /&gt;
&lt;br /&gt;
On Tuesday, October 1, 2024, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until Friday October 4, 2024.&lt;br /&gt;
&lt;br /&gt;
We apologise for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update II&lt;br /&gt;
| date = 2024/10/04&lt;br /&gt;
| message = The maintenance window will be extended until at least Monday, October 7, 2024 due to a power distribution issue in our renovated data centre.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Monday, October 7, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update III&lt;br /&gt;
| date = 2024/10/07&lt;br /&gt;
| message = Due to technical issues beyond our control the maintenance window will be extended until at least Tuesday, October 15, 2024.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Tuesday, October 15, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Normal Scheduling has resumed. &lt;br /&gt;
| date = 2024/10/08&lt;br /&gt;
| message = The ARC cluster has been successfully brought online and nodes are running jobs normally. We apologize for the extended downtime. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance &lt;br /&gt;
| date = 2024/12/11&lt;br /&gt;
| message = The ARC login node will be rebooted on Tuesday December 17 for scheduled maintenance. It will be down for a few minutes and return shortly. Job scheduling and jobs running on the cluster will not be affected. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/07&lt;br /&gt;
| message = The ARC cluster will be rebooted for OS updates on Monday January 20, 2025. Please make sure to save your work and log out before the reboot happens. Scheduling will be paused until the cluster is back, but queued jobs will remain in the queue and nodes will start scheduling when the cluster is ready. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/15&lt;br /&gt;
| message = The ARC cluster will be down for maintenance and upgrades starting 9AM Monday, January 20, 2025 through Wednesday, January 22, 2025. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For the duration of the upgrade window:&lt;br /&gt;
* Scheduling will be paused and new jobs will be queued. Any queued jobs will start scheduling only after the upgrade is complete.&lt;br /&gt;
* Access to files via the login node and arc-dtn will generally be available but intermittent. File transfers on the DTN node, including Globus file transfers, may be interrupted during this window.&lt;br /&gt;
&lt;br /&gt;
Please make sure to save your work prior to this outage window to avoid any loss of work.&lt;br /&gt;
&lt;br /&gt;
During this time the following changes will happen:&lt;br /&gt;
&lt;br /&gt;
1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:&lt;br /&gt;
* cpu2023 (temporary)&lt;br /&gt;
* Parallel&lt;br /&gt;
* Theia/Synergy/cpu2017-bf05&lt;br /&gt;
* Single&lt;br /&gt;
Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.&lt;br /&gt;
&lt;br /&gt;
2. A component of the NetApp filer will be replaced. Access to /bulk will be unavailable on Wednesday, January 22, 2025.&lt;br /&gt;
&lt;br /&gt;
3. The compute node operating system will be updated to Rocky Linux 8.10.&lt;br /&gt;
&lt;br /&gt;
4. The Slurm scheduling system will be upgraded.&lt;br /&gt;
&lt;br /&gt;
5. The Open OnDemand web portal will be upgraded.&lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.calgary.ca with any issues or concerns.&lt;br /&gt;
&lt;br /&gt;
⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️&lt;br /&gt;
Update Jan 18, 2025&lt;br /&gt;
&lt;br /&gt;
Around 10AM Arc experienced an electrical power brownout.  Some percentage (how many is unknown at this time) of the nodes lost electrical power during this time causing a loss of a number of running jobs.  &lt;br /&gt;
&lt;br /&gt;
Sorry for the inconvenience.  &lt;br /&gt;
&lt;br /&gt;
Since Arc is shutting down for maintenance on Monday Jan 20, replacement jobs will likely not start unless they request a timelimit less than the time until 8AM Monday.  &lt;br /&gt;
⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Maintenance Complete&lt;br /&gt;
| date = 2025/01/22&lt;br /&gt;
| message = The ARC cluster upgrade is complete&lt;br /&gt;
&lt;br /&gt;
During this time the following changes happened:&lt;br /&gt;
&lt;br /&gt;
1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:&lt;br /&gt;
* cpu2023 (temporary)&lt;br /&gt;
* Parallel&lt;br /&gt;
* Theia/Synergy/cpu2017-bf05&lt;br /&gt;
* Single&lt;br /&gt;
Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.&lt;br /&gt;
&lt;br /&gt;
2. A component of the NetApp filer was replaced successfully.&lt;br /&gt;
&lt;br /&gt;
3. The compute node operating was updated to Rocky Linux 8.10.&lt;br /&gt;
&lt;br /&gt;
4. The Slurm scheduling system was upgraded.&lt;br /&gt;
&lt;br /&gt;
5. The Open OnDemand web portal was upgraded.&lt;br /&gt;
&lt;br /&gt;
6. The Parallel partition was renamed to Legacy to show the lack of an interconnect for parallel MPI work and was restricted to maximum 4 node jobs.&lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.calgary.ca with any issues or concerns.&lt;br /&gt;
&lt;br /&gt;
Jan 23, 9:08AM&lt;br /&gt;
&lt;br /&gt;
Remount complete.  arc is back in full service.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Module Command Upgrade&lt;br /&gt;
| date = 2025/02/03&lt;br /&gt;
| message = Upgrade of the module command&lt;br /&gt;
&lt;br /&gt;
On Tuesday, February 11, 2025 the module command will be upgraded to a new verson on Arc. This should result in new capabilities and a slightly different visual experience when using the module command.  Loading modules is not expected to change.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Module Command Upgraded&lt;br /&gt;
| date = 2025/02/12&lt;br /&gt;
| message = The module command was upgraded&lt;br /&gt;
&lt;br /&gt;
On Tuesday, February 12, 2025 the module command was upgraded to a new verson on Arc. This should result in new capabilities and a slightly different visual experience when using the module command.  Loading modules should not have changed.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address down&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca Unavailable&lt;br /&gt;
&lt;br /&gt;
Please be informed that our support email address (support@hpc.ucalgary.ca) for RCS is currently not working. We are working to bring it back as soon as possible. Please keep an eye on this space for updates. The clusters are working normally, but support will not receive your messages at this time. We will begin responding as soon as we can get it back. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address functional&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca is back&lt;br /&gt;
&lt;br /&gt;
support@hpc.ucalgary.ca has been repaired and RCS can be contacted there. If you had reached out for assistance in recent days without response please follow up as we may not have received your initial email. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Interactive Job Timelimit will be Enforced&lt;br /&gt;
| date = 2025/04/11&lt;br /&gt;
| message = In order to improve the scheduling and job throughput efficiency of ARC, interactive jobs will be limited to a maximum of 5 hours of runtime.  Interactive jobs that are submitted with a timelimit over 5 hours will be rejected at submission time.  This change will be made on Monday, April 28, 2025.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Interactive Job Timelimit Is Now Enforced&lt;br /&gt;
| date = 2025/04/28&lt;br /&gt;
| message = In order to improve the scheduling and job throughput efficiency of ARC, interactive jobs are now limited to a maximum of 5 hours of runtime.  Interactive jobs that are submitted with a timelimit over 5 hours will be rejected at submission time.&lt;br /&gt;
&lt;br /&gt;
Apr 29, 2025&lt;br /&gt;
To increase the security posture of the Arc cluster administrators will be installing Trend Micro on cluster login nodes over the week starting Apr 30.  Please report any inconsistencies to support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Bulk Filesystem Emergency Maintenance&lt;br /&gt;
| date = 2025/07/29&lt;br /&gt;
| message = The filer that provides the /bulk filesystem will be down for emergency repairs at 12 Noon on Thursday July 31.  No access to files on /bulk will be possible for the duration of the multi-hour outage.  Any jobs running that access /bulk will start and then pause when access to /bulk is attempted.  Jobs should continue once service is restored.  Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Bulk Filesystem Maintenance Complete&lt;br /&gt;
| date = 2025/08/01&lt;br /&gt;
| message = Maintenance on /bulk was completed successfully and all filesystems are back in service on Arc.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU partition changes on ARC&lt;br /&gt;
| date = 2025/08/15&lt;br /&gt;
| message = New gpu-h100 and gpu-l40 partitions are available for general scheduling with new gpu hardware. gpu-v100 also has 6 fewer nodes. You can view more details about the node specs by running the arc.hardware script on the login node. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Update and Bulk outage on ARC&lt;br /&gt;
| date = 2025/09/15&lt;br /&gt;
| message = ⚠️ Arc Update September 15-19&lt;br /&gt;
&lt;br /&gt;
An update of the Arc cluster will begin on Sep 15.  This will result in fewer resources while the compute nodes are restarted.  The login node and Scheduler will be restarted on Sep 17.&lt;br /&gt;
&lt;br /&gt;
⚠️ Bulk Filesystem Maintenance September 17&lt;br /&gt;
The filer that provides the /bulk filesystem will be down for emergency repairs at 9 AM on Wednesday Sep 17th. No access to files on /bulk will be possible for the duration of the multi-hour outage. Any jobs running that access /bulk will start and then pause when access to /bulk is attempted. Jobs should continue once service is restored.  We apologize for any inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Update and Bulk outage on ARC&lt;br /&gt;
| date = 2025/09/18&lt;br /&gt;
| message = Arc update complete&lt;br /&gt;
&lt;br /&gt;
The update to arc is complete.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU partition changes on ARC&lt;br /&gt;
| date = 2025/09/23&lt;br /&gt;
| message = gpu-v100 has 6 more nodes. You can view more details about the node specs by running the arc.hardware script on the login node. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2025/11/4&lt;br /&gt;
| message =&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Legacy compute nodes are being retired&lt;br /&gt;
| date = 2025/11/20&lt;br /&gt;
| message = ARC nodes cn[0513-1096] are being removed from the arc cluster. They will be removed from scheduling and removed from the cluster over the next while. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Planned outage in Jan 2026&lt;br /&gt;
| date = 2025/12/26&lt;br /&gt;
| message = ⚠️ Arc Update Week of January 12&lt;br /&gt;
&lt;br /&gt;
ARC will be going down periodically during the week of 2026/Jan/12 to allow for system maintenance. All nodes and filesystems will be unavailable for use during the outage, and filesystem performance may be variable following the upgrade for a time until the changes finish. &lt;br /&gt;
&lt;br /&gt;
If you have questions or concerns, please contact support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Navbox ARC}}&lt;br /&gt;
[[Category:ARC]]&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=3953</id>
		<title>ARC Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=3953"/>
		<updated>2025-12-16T19:41:22Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{ARC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ January System Updates&lt;br /&gt;
| date = 2023/01/01&lt;br /&gt;
| message =&lt;br /&gt;
Beginning January 16, 2023, the ARC cluster will undergo operating system updates. We shall do our utmost to minimize disruption and allow ongoing jobs to be completed. New jobs may be temporarily held from scheduling.&lt;br /&gt;
&lt;br /&gt;
The ARC login node will reboot on the morning of January 16. Please save your work and log out if possible.&lt;br /&gt;
&lt;br /&gt;
The upgrade is planned to be fully complete by January 20.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = System Updates Completed&lt;br /&gt;
| date = 2023/01/24&lt;br /&gt;
| message =&lt;br /&gt;
The upgrade has been completed. The following has been changed:&lt;br /&gt;
* OS Updated to Rocky Linux 8.7&lt;br /&gt;
* Slurm updated to 22.05.7&lt;br /&gt;
* Apptainer replaces Singularity&lt;br /&gt;
* Each job will have its own /tmp, /dev/shm, /run/user/$uid mounted&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/02/28&lt;br /&gt;
| message =&lt;br /&gt;
We are currently investigating a filesystem issue that is causing filesystem slowdowns across ARC.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/03/1&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns across ARC. Some jobs on ARC have been paused to help us find the root cause of the slowdowns.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ARC Login node reboot&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
The ARC login node will be rebooted this afternoon for an emergency maintenance. This downtime is needed to help mitigate the filesystem slowdowns experienced on the login node.  Jobs will continue running and scheduling during this time.&lt;br /&gt;
&lt;br /&gt;
All logins to the ARC login node will be terminated at 3:00PM and will remain unavailable until 4:00PM.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Filesystem Issues&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns on specific nodes in our MSRDC location.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues Resolved&lt;br /&gt;
| date = 2023/03/10&lt;br /&gt;
| message =&lt;br /&gt;
We have upgraded the filesystem routers in our MSRDC location to address the performance issues.&lt;br /&gt;
&lt;br /&gt;
Please let us know if you experience any issues with the filesystem performance.&lt;br /&gt;
&lt;br /&gt;
Thank-you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/05/01&lt;br /&gt;
| message =&lt;br /&gt;
On May 1, 2023, the ARC Open OnDemand node will be rebooted between 5PM and 6PM. Expected downtime will be approximately 15 minutes.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Apptainer (Singularity) on ARC Login Node&lt;br /&gt;
| date = 2023/06/22&lt;br /&gt;
| message =&lt;br /&gt;
Apptainer (Singularity) containers may experience an error when&lt;br /&gt;
running on the Arc login node. If apptainer complains that a system&lt;br /&gt;
administrator needs to enable user namespaces, simply run your&lt;br /&gt;
containers inside a job.&lt;br /&gt;
&lt;br /&gt;
This is a temporary measure due to security vulnerability that will be&lt;br /&gt;
patched soon.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Lattice, Single, cpu2013 partition changes&lt;br /&gt;
| date = 2023/07/13&lt;br /&gt;
| message =&lt;br /&gt;
The Lattice and Single, and cpu2013 have all been decomissioned.  The Single&lt;br /&gt;
partition will be replaced by the nodes formerly in the cpu2013 partition but&lt;br /&gt;
will be called single.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/10/17&lt;br /&gt;
| message =&lt;br /&gt;
Open OnDemand will be rebooted on October 17, 2023 for an update. It will be down for up to 30 minutes.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Storage Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2023/10/23&lt;br /&gt;
| message =&lt;br /&gt;
We will be performing storage upgrades on the MARC/ARC cluster on &lt;br /&gt;
November 16 and 17, 2023. To facilitate this, we will be throttling &lt;br /&gt;
down the number of jobs on both clusters while the upgrades are &lt;br /&gt;
performed&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/05/3&lt;br /&gt;
| message =&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Power Interruption&lt;br /&gt;
| date = 2024/05/07&lt;br /&gt;
| message = Arc Experienced an brief power outage around 11AM May 7, 2024.&lt;br /&gt;
Most compute nodes have or are rebooting.  Most jobs running at this time &lt;br /&gt;
were lost. Arc administrators are actively working on restarting compute &lt;br /&gt;
nodes. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation&lt;br /&gt;
| date = 2024/06/03&lt;br /&gt;
| message = Job submissions targeted to the  GPU a100 partition will be &lt;br /&gt;
affected by a temporary reservation on the nodes to accommodate the RCS&lt;br /&gt;
summer school class taking place on 2024/Jun/10. Reservation will end &lt;br /&gt;
shortly after. Please submit your jobs normally and the scheduler will &lt;br /&gt;
start them as soon as the nodes are available. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation Removed&lt;br /&gt;
| date = 2024/06/11&lt;br /&gt;
| message = GPU a100 Nodes in ARC have been returned to normal scheduling. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Notice of Upcoming Partial Outage&lt;br /&gt;
| date = 2024/08/23&lt;br /&gt;
| message = Several compute nodes from the ARC cluster will be unavailable &lt;br /&gt;
between Sept 23 to Sept 27 inclusive (subject to change). All compute nodes &lt;br /&gt;
in cpu2019, cpu2021/2, gpu-v100 most nodes from bigmem and gpu-a100 will be &lt;br /&gt;
affected. These nodes will return to service as soon as the work is complete.  &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update I&lt;br /&gt;
| date = 2024/09/25&lt;br /&gt;
| message = Due to hardware issues that is blocking our original maintenance window, most compute nodes that were taken offline on Monday has been brought back online today. An additional partial outage will occur again starting next Tuesday for the same nodes.&lt;br /&gt;
&lt;br /&gt;
On Tuesday, October 1, 2024, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until Friday October 4, 2024.&lt;br /&gt;
&lt;br /&gt;
We apologise for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update II&lt;br /&gt;
| date = 2024/10/04&lt;br /&gt;
| message = The maintenance window will be extended until at least Monday, October 7, 2024 due to a power distribution issue in our renovated data centre.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Monday, October 7, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update III&lt;br /&gt;
| date = 2024/10/07&lt;br /&gt;
| message = Due to technical issues beyond our control the maintenance window will be extended until at least Tuesday, October 15, 2024.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Tuesday, October 15, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Normal Scheduling has resumed. &lt;br /&gt;
| date = 2024/10/08&lt;br /&gt;
| message = The ARC cluster has been successfully brought online and nodes are running jobs normally. We apologize for the extended downtime. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance &lt;br /&gt;
| date = 2024/12/11&lt;br /&gt;
| message = The ARC login node will be rebooted on Tuesday December 17 for scheduled maintenance. It will be down for a few minutes and return shortly. Job scheduling and jobs running on the cluster will not be affected. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/07&lt;br /&gt;
| message = The ARC cluster will be rebooted for OS updates on Monday January 20, 2025. Please make sure to save your work and log out before the reboot happens. Scheduling will be paused until the cluster is back, but queued jobs will remain in the queue and nodes will start scheduling when the cluster is ready. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/15&lt;br /&gt;
| message = The ARC cluster will be down for maintenance and upgrades starting 9AM Monday, January 20, 2025 through Wednesday, January 22, 2025. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For the duration of the upgrade window:&lt;br /&gt;
* Scheduling will be paused and new jobs will be queued. Any queued jobs will start scheduling only after the upgrade is complete.&lt;br /&gt;
* Access to files via the login node and arc-dtn will generally be available but intermittent. File transfers on the DTN node, including Globus file transfers, may be interrupted during this window.&lt;br /&gt;
&lt;br /&gt;
Please make sure to save your work prior to this outage window to avoid any loss of work.&lt;br /&gt;
&lt;br /&gt;
During this time the following changes will happen:&lt;br /&gt;
&lt;br /&gt;
1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:&lt;br /&gt;
* cpu2023 (temporary)&lt;br /&gt;
* Parallel&lt;br /&gt;
* Theia/Synergy/cpu2017-bf05&lt;br /&gt;
* Single&lt;br /&gt;
Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.&lt;br /&gt;
&lt;br /&gt;
2. A component of the NetApp filer will be replaced. Access to /bulk will be unavailable on Wednesday, January 22, 2025.&lt;br /&gt;
&lt;br /&gt;
3. The compute node operating system will be updated to Rocky Linux 8.10.&lt;br /&gt;
&lt;br /&gt;
4. The Slurm scheduling system will be upgraded.&lt;br /&gt;
&lt;br /&gt;
5. The Open OnDemand web portal will be upgraded.&lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.calgary.ca with any issues or concerns.&lt;br /&gt;
&lt;br /&gt;
⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️&lt;br /&gt;
Update Jan 18, 2025&lt;br /&gt;
&lt;br /&gt;
Around 10AM Arc experienced an electrical power brownout.  Some percentage (how many is unknown at this time) of the nodes lost electrical power during this time causing a loss of a number of running jobs.  &lt;br /&gt;
&lt;br /&gt;
Sorry for the inconvenience.  &lt;br /&gt;
&lt;br /&gt;
Since Arc is shutting down for maintenance on Monday Jan 20, replacement jobs will likely not start unless they request a timelimit less than the time until 8AM Monday.  &lt;br /&gt;
⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Maintenance Complete&lt;br /&gt;
| date = 2025/01/22&lt;br /&gt;
| message = The ARC cluster upgrade is complete&lt;br /&gt;
&lt;br /&gt;
During this time the following changes happened:&lt;br /&gt;
&lt;br /&gt;
1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:&lt;br /&gt;
* cpu2023 (temporary)&lt;br /&gt;
* Parallel&lt;br /&gt;
* Theia/Synergy/cpu2017-bf05&lt;br /&gt;
* Single&lt;br /&gt;
Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.&lt;br /&gt;
&lt;br /&gt;
2. A component of the NetApp filer was replaced successfully.&lt;br /&gt;
&lt;br /&gt;
3. The compute node operating was updated to Rocky Linux 8.10.&lt;br /&gt;
&lt;br /&gt;
4. The Slurm scheduling system was upgraded.&lt;br /&gt;
&lt;br /&gt;
5. The Open OnDemand web portal was upgraded.&lt;br /&gt;
&lt;br /&gt;
6. The Parallel partition was renamed to Legacy to show the lack of an interconnect for parallel MPI work and was restricted to maximum 4 node jobs.&lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.calgary.ca with any issues or concerns.&lt;br /&gt;
&lt;br /&gt;
Jan 23, 9:08AM&lt;br /&gt;
&lt;br /&gt;
Remount complete.  arc is back in full service.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Module Command Upgrade&lt;br /&gt;
| date = 2025/02/03&lt;br /&gt;
| message = Upgrade of the module command&lt;br /&gt;
&lt;br /&gt;
On Tuesday, February 11, 2025 the module command will be upgraded to a new verson on Arc. This should result in new capabilities and a slightly different visual experience when using the module command.  Loading modules is not expected to change.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Module Command Upgraded&lt;br /&gt;
| date = 2025/02/12&lt;br /&gt;
| message = The module command was upgraded&lt;br /&gt;
&lt;br /&gt;
On Tuesday, February 12, 2025 the module command was upgraded to a new verson on Arc. This should result in new capabilities and a slightly different visual experience when using the module command.  Loading modules should not have changed.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address down&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca Unavailable&lt;br /&gt;
&lt;br /&gt;
Please be informed that our support email address (support@hpc.ucalgary.ca) for RCS is currently not working. We are working to bring it back as soon as possible. Please keep an eye on this space for updates. The clusters are working normally, but support will not receive your messages at this time. We will begin responding as soon as we can get it back. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address functional&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca is back&lt;br /&gt;
&lt;br /&gt;
support@hpc.ucalgary.ca has been repaired and RCS can be contacted there. If you had reached out for assistance in recent days without response please follow up as we may not have received your initial email. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Interactive Job Timelimit will be Enforced&lt;br /&gt;
| date = 2025/04/11&lt;br /&gt;
| message = In order to improve the scheduling and job throughput efficiency of ARC, interactive jobs will be limited to a maximum of 5 hours of runtime.  Interactive jobs that are submitted with a timelimit over 5 hours will be rejected at submission time.  This change will be made on Monday, April 28, 2025.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Interactive Job Timelimit Is Now Enforced&lt;br /&gt;
| date = 2025/04/28&lt;br /&gt;
| message = In order to improve the scheduling and job throughput efficiency of ARC, interactive jobs are now limited to a maximum of 5 hours of runtime.  Interactive jobs that are submitted with a timelimit over 5 hours will be rejected at submission time.&lt;br /&gt;
&lt;br /&gt;
Apr 29, 2025&lt;br /&gt;
To increase the security posture of the Arc cluster administrators will be installing Trend Micro on cluster login nodes over the week starting Apr 30.  Please report any inconsistencies to support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Bulk Filesystem Emergency Maintenance&lt;br /&gt;
| date = 2025/07/29&lt;br /&gt;
| message = The filer that provides the /bulk filesystem will be down for emergency repairs at 12 Noon on Thursday July 31.  No access to files on /bulk will be possible for the duration of the multi-hour outage.  Any jobs running that access /bulk will start and then pause when access to /bulk is attempted.  Jobs should continue once service is restored.  Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Bulk Filesystem Maintenance Complete&lt;br /&gt;
| date = 2025/08/01&lt;br /&gt;
| message = Maintenance on /bulk was completed successfully and all filesystems are back in service on Arc.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU partition changes on ARC&lt;br /&gt;
| date = 2025/08/15&lt;br /&gt;
| message = New gpu-h100 and gpu-l40 partitions are available for general scheduling with new gpu hardware. gpu-v100 also has 6 fewer nodes. You can view more details about the node specs by running the arc.hardware script on the login node. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Update and Bulk outage on ARC&lt;br /&gt;
| date = 2025/09/15&lt;br /&gt;
| message = ⚠️ Arc Update September 15-19&lt;br /&gt;
&lt;br /&gt;
An update of the Arc cluster will begin on Sep 15.  This will result in fewer resources while the compute nodes are restarted.  The login node and Scheduler will be restarted on Sep 17.&lt;br /&gt;
&lt;br /&gt;
⚠️ Bulk Filesystem Maintenance September 17&lt;br /&gt;
The filer that provides the /bulk filesystem will be down for emergency repairs at 9 AM on Wednesday Sep 17th. No access to files on /bulk will be possible for the duration of the multi-hour outage. Any jobs running that access /bulk will start and then pause when access to /bulk is attempted. Jobs should continue once service is restored.  We apologize for any inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Update and Bulk outage on ARC&lt;br /&gt;
| date = 2025/09/18&lt;br /&gt;
| message = Arc update complete&lt;br /&gt;
&lt;br /&gt;
The update to arc is complete.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU partition changes on ARC&lt;br /&gt;
| date = 2025/09/23&lt;br /&gt;
| message = gpu-v100 has 6 more nodes. You can view more details about the node specs by running the arc.hardware script on the login node. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2025/11/4&lt;br /&gt;
| message =&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Legacy compute nodes are being retired&lt;br /&gt;
| date = 2025/11/20&lt;br /&gt;
| message = ARC nodes cn[0513-1096] are being removed from the arc cluster. They will be removed from scheduling and removed from the cluster over the next while. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Planned outage in Jan 2026&lt;br /&gt;
| date = 2025/12/26&lt;br /&gt;
| message = ARC will be going down periodically during the week of 2026/Jan/12 to allow for system maintenance. All nodes and filesystems will be unavailable for use during the outage, and filesystem performance may be variable following the upgrade for a time until the changes finish. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Navbox ARC}}&lt;br /&gt;
[[Category:ARC]]&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=3948</id>
		<title>ARC Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=3948"/>
		<updated>2025-11-20T17:29:39Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{ARC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ January System Updates&lt;br /&gt;
| date = 2023/01/01&lt;br /&gt;
| message =&lt;br /&gt;
Beginning January 16, 2023, the ARC cluster will undergo operating system updates. We shall do our utmost to minimize disruption and allow ongoing jobs to be completed. New jobs may be temporarily held from scheduling.&lt;br /&gt;
&lt;br /&gt;
The ARC login node will reboot on the morning of January 16. Please save your work and log out if possible.&lt;br /&gt;
&lt;br /&gt;
The upgrade is planned to be fully complete by January 20.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = System Updates Completed&lt;br /&gt;
| date = 2023/01/24&lt;br /&gt;
| message =&lt;br /&gt;
The upgrade has been completed. The following has been changed:&lt;br /&gt;
* OS Updated to Rocky Linux 8.7&lt;br /&gt;
* Slurm updated to 22.05.7&lt;br /&gt;
* Apptainer replaces Singularity&lt;br /&gt;
* Each job will have its own /tmp, /dev/shm, /run/user/$uid mounted&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/02/28&lt;br /&gt;
| message =&lt;br /&gt;
We are currently investigating a filesystem issue that is causing filesystem slowdowns across ARC.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/03/1&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns across ARC. Some jobs on ARC have been paused to help us find the root cause of the slowdowns.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ARC Login node reboot&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
The ARC login node will be rebooted this afternoon for an emergency maintenance. This downtime is needed to help mitigate the filesystem slowdowns experienced on the login node.  Jobs will continue running and scheduling during this time.&lt;br /&gt;
&lt;br /&gt;
All logins to the ARC login node will be terminated at 3:00PM and will remain unavailable until 4:00PM.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Filesystem Issues&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns on specific nodes in our MSRDC location.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues Resolved&lt;br /&gt;
| date = 2023/03/10&lt;br /&gt;
| message =&lt;br /&gt;
We have upgraded the filesystem routers in our MSRDC location to address the performance issues.&lt;br /&gt;
&lt;br /&gt;
Please let us know if you experience any issues with the filesystem performance.&lt;br /&gt;
&lt;br /&gt;
Thank-you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/05/01&lt;br /&gt;
| message =&lt;br /&gt;
On May 1, 2023, the ARC Open OnDemand node will be rebooted between 5PM and 6PM. Expected downtime will be approximately 15 minutes.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Apptainer (Singularity) on ARC Login Node&lt;br /&gt;
| date = 2023/06/22&lt;br /&gt;
| message =&lt;br /&gt;
Apptainer (Singularity) containers may experience an error when&lt;br /&gt;
running on the Arc login node. If apptainer complains that a system&lt;br /&gt;
administrator needs to enable user namespaces, simply run your&lt;br /&gt;
containers inside a job.&lt;br /&gt;
&lt;br /&gt;
This is a temporary measure due to security vulnerability that will be&lt;br /&gt;
patched soon.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Lattice, Single, cpu2013 partition changes&lt;br /&gt;
| date = 2023/07/13&lt;br /&gt;
| message =&lt;br /&gt;
The Lattice and Single, and cpu2013 have all been decomissioned.  The Single&lt;br /&gt;
partition will be replaced by the nodes formerly in the cpu2013 partition but&lt;br /&gt;
will be called single.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/10/17&lt;br /&gt;
| message =&lt;br /&gt;
Open OnDemand will be rebooted on October 17, 2023 for an update. It will be down for up to 30 minutes.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Storage Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2023/10/23&lt;br /&gt;
| message =&lt;br /&gt;
We will be performing storage upgrades on the MARC/ARC cluster on &lt;br /&gt;
November 16 and 17, 2023. To facilitate this, we will be throttling &lt;br /&gt;
down the number of jobs on both clusters while the upgrades are &lt;br /&gt;
performed&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/05/3&lt;br /&gt;
| message =&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Power Interruption&lt;br /&gt;
| date = 2024/05/07&lt;br /&gt;
| message = Arc Experienced an brief power outage around 11AM May 7, 2024.&lt;br /&gt;
Most compute nodes have or are rebooting.  Most jobs running at this time &lt;br /&gt;
were lost. Arc administrators are actively working on restarting compute &lt;br /&gt;
nodes. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation&lt;br /&gt;
| date = 2024/06/03&lt;br /&gt;
| message = Job submissions targeted to the  GPU a100 partition will be &lt;br /&gt;
affected by a temporary reservation on the nodes to accommodate the RCS&lt;br /&gt;
summer school class taking place on 2024/Jun/10. Reservation will end &lt;br /&gt;
shortly after. Please submit your jobs normally and the scheduler will &lt;br /&gt;
start them as soon as the nodes are available. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation Removed&lt;br /&gt;
| date = 2024/06/11&lt;br /&gt;
| message = GPU a100 Nodes in ARC have been returned to normal scheduling. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Notice of Upcoming Partial Outage&lt;br /&gt;
| date = 2024/08/23&lt;br /&gt;
| message = Several compute nodes from the ARC cluster will be unavailable &lt;br /&gt;
between Sept 23 to Sept 27 inclusive (subject to change). All compute nodes &lt;br /&gt;
in cpu2019, cpu2021/2, gpu-v100 most nodes from bigmem and gpu-a100 will be &lt;br /&gt;
affected. These nodes will return to service as soon as the work is complete.  &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update I&lt;br /&gt;
| date = 2024/09/25&lt;br /&gt;
| message = Due to hardware issues that is blocking our original maintenance window, most compute nodes that were taken offline on Monday has been brought back online today. An additional partial outage will occur again starting next Tuesday for the same nodes.&lt;br /&gt;
&lt;br /&gt;
On Tuesday, October 1, 2024, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until Friday October 4, 2024.&lt;br /&gt;
&lt;br /&gt;
We apologise for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update II&lt;br /&gt;
| date = 2024/10/04&lt;br /&gt;
| message = The maintenance window will be extended until at least Monday, October 7, 2024 due to a power distribution issue in our renovated data centre.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Monday, October 7, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update III&lt;br /&gt;
| date = 2024/10/07&lt;br /&gt;
| message = Due to technical issues beyond our control the maintenance window will be extended until at least Tuesday, October 15, 2024.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Tuesday, October 15, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Normal Scheduling has resumed. &lt;br /&gt;
| date = 2024/10/08&lt;br /&gt;
| message = The ARC cluster has been successfully brought online and nodes are running jobs normally. We apologize for the extended downtime. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance &lt;br /&gt;
| date = 2024/12/11&lt;br /&gt;
| message = The ARC login node will be rebooted on Tuesday December 17 for scheduled maintenance. It will be down for a few minutes and return shortly. Job scheduling and jobs running on the cluster will not be affected. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/07&lt;br /&gt;
| message = The ARC cluster will be rebooted for OS updates on Monday January 20, 2025. Please make sure to save your work and log out before the reboot happens. Scheduling will be paused until the cluster is back, but queued jobs will remain in the queue and nodes will start scheduling when the cluster is ready. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/15&lt;br /&gt;
| message = The ARC cluster will be down for maintenance and upgrades starting 9AM Monday, January 20, 2025 through Wednesday, January 22, 2025. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For the duration of the upgrade window:&lt;br /&gt;
* Scheduling will be paused and new jobs will be queued. Any queued jobs will start scheduling only after the upgrade is complete.&lt;br /&gt;
* Access to files via the login node and arc-dtn will generally be available but intermittent. File transfers on the DTN node, including Globus file transfers, may be interrupted during this window.&lt;br /&gt;
&lt;br /&gt;
Please make sure to save your work prior to this outage window to avoid any loss of work.&lt;br /&gt;
&lt;br /&gt;
During this time the following changes will happen:&lt;br /&gt;
&lt;br /&gt;
1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:&lt;br /&gt;
* cpu2023 (temporary)&lt;br /&gt;
* Parallel&lt;br /&gt;
* Theia/Synergy/cpu2017-bf05&lt;br /&gt;
* Single&lt;br /&gt;
Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.&lt;br /&gt;
&lt;br /&gt;
2. A component of the NetApp filer will be replaced. Access to /bulk will be unavailable on Wednesday, January 22, 2025.&lt;br /&gt;
&lt;br /&gt;
3. The compute node operating system will be updated to Rocky Linux 8.10.&lt;br /&gt;
&lt;br /&gt;
4. The Slurm scheduling system will be upgraded.&lt;br /&gt;
&lt;br /&gt;
5. The Open OnDemand web portal will be upgraded.&lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.calgary.ca with any issues or concerns.&lt;br /&gt;
&lt;br /&gt;
⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️&lt;br /&gt;
Update Jan 18, 2025&lt;br /&gt;
&lt;br /&gt;
Around 10AM Arc experienced an electrical power brownout.  Some percentage (how many is unknown at this time) of the nodes lost electrical power during this time causing a loss of a number of running jobs.  &lt;br /&gt;
&lt;br /&gt;
Sorry for the inconvenience.  &lt;br /&gt;
&lt;br /&gt;
Since Arc is shutting down for maintenance on Monday Jan 20, replacement jobs will likely not start unless they request a timelimit less than the time until 8AM Monday.  &lt;br /&gt;
⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Maintenance Complete&lt;br /&gt;
| date = 2025/01/22&lt;br /&gt;
| message = The ARC cluster upgrade is complete&lt;br /&gt;
&lt;br /&gt;
During this time the following changes happened:&lt;br /&gt;
&lt;br /&gt;
1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:&lt;br /&gt;
* cpu2023 (temporary)&lt;br /&gt;
* Parallel&lt;br /&gt;
* Theia/Synergy/cpu2017-bf05&lt;br /&gt;
* Single&lt;br /&gt;
Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.&lt;br /&gt;
&lt;br /&gt;
2. A component of the NetApp filer was replaced successfully.&lt;br /&gt;
&lt;br /&gt;
3. The compute node operating was updated to Rocky Linux 8.10.&lt;br /&gt;
&lt;br /&gt;
4. The Slurm scheduling system was upgraded.&lt;br /&gt;
&lt;br /&gt;
5. The Open OnDemand web portal was upgraded.&lt;br /&gt;
&lt;br /&gt;
6. The Parallel partition was renamed to Legacy to show the lack of an interconnect for parallel MPI work and was restricted to maximum 4 node jobs.&lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.calgary.ca with any issues or concerns.&lt;br /&gt;
&lt;br /&gt;
Jan 23, 9:08AM&lt;br /&gt;
&lt;br /&gt;
Remount complete.  arc is back in full service.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Module Command Upgrade&lt;br /&gt;
| date = 2025/02/03&lt;br /&gt;
| message = Upgrade of the module command&lt;br /&gt;
&lt;br /&gt;
On Tuesday, February 11, 2025 the module command will be upgraded to a new verson on Arc. This should result in new capabilities and a slightly different visual experience when using the module command.  Loading modules is not expected to change.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Module Command Upgraded&lt;br /&gt;
| date = 2025/02/12&lt;br /&gt;
| message = The module command was upgraded&lt;br /&gt;
&lt;br /&gt;
On Tuesday, February 12, 2025 the module command was upgraded to a new verson on Arc. This should result in new capabilities and a slightly different visual experience when using the module command.  Loading modules should not have changed.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address down&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca Unavailable&lt;br /&gt;
&lt;br /&gt;
Please be informed that our support email address (support@hpc.ucalgary.ca) for RCS is currently not working. We are working to bring it back as soon as possible. Please keep an eye on this space for updates. The clusters are working normally, but support will not receive your messages at this time. We will begin responding as soon as we can get it back. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address functional&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca is back&lt;br /&gt;
&lt;br /&gt;
support@hpc.ucalgary.ca has been repaired and RCS can be contacted there. If you had reached out for assistance in recent days without response please follow up as we may not have received your initial email. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Interactive Job Timelimit will be Enforced&lt;br /&gt;
| date = 2025/04/11&lt;br /&gt;
| message = In order to improve the scheduling and job throughput efficiency of ARC, interactive jobs will be limited to a maximum of 5 hours of runtime.  Interactive jobs that are submitted with a timelimit over 5 hours will be rejected at submission time.  This change will be made on Monday, April 28, 2025.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Interactive Job Timelimit Is Now Enforced&lt;br /&gt;
| date = 2025/04/28&lt;br /&gt;
| message = In order to improve the scheduling and job throughput efficiency of ARC, interactive jobs are now limited to a maximum of 5 hours of runtime.  Interactive jobs that are submitted with a timelimit over 5 hours will be rejected at submission time.&lt;br /&gt;
&lt;br /&gt;
Apr 29, 2025&lt;br /&gt;
To increase the security posture of the Arc cluster administrators will be installing Trend Micro on cluster login nodes over the week starting Apr 30.  Please report any inconsistencies to support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Bulk Filesystem Emergency Maintenance&lt;br /&gt;
| date = 2025/07/29&lt;br /&gt;
| message = The filer that provides the /bulk filesystem will be down for emergency repairs at 12 Noon on Thursday July 31.  No access to files on /bulk will be possible for the duration of the multi-hour outage.  Any jobs running that access /bulk will start and then pause when access to /bulk is attempted.  Jobs should continue once service is restored.  Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Bulk Filesystem Maintenance Complete&lt;br /&gt;
| date = 2025/08/01&lt;br /&gt;
| message = Maintenance on /bulk was completed successfully and all filesystems are back in service on Arc.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU partition changes on ARC&lt;br /&gt;
| date = 2025/08/15&lt;br /&gt;
| message = New gpu-h100 and gpu-l40 partitions are available for general scheduling with new gpu hardware. gpu-v100 also has 6 fewer nodes. You can view more details about the node specs by running the arc.hardware script on the login node. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Update and Bulk outage on ARC&lt;br /&gt;
| date = 2025/09/15&lt;br /&gt;
| message = ⚠️ Arc Update September 15-19&lt;br /&gt;
&lt;br /&gt;
An update of the Arc cluster will begin on Sep 15.  This will result in fewer resources while the compute nodes are restarted.  The login node and Scheduler will be restarted on Sep 17.&lt;br /&gt;
&lt;br /&gt;
⚠️ Bulk Filesystem Maintenance September 17&lt;br /&gt;
The filer that provides the /bulk filesystem will be down for emergency repairs at 9 AM on Wednesday Sep 17th. No access to files on /bulk will be possible for the duration of the multi-hour outage. Any jobs running that access /bulk will start and then pause when access to /bulk is attempted. Jobs should continue once service is restored.  We apologize for any inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Update and Bulk outage on ARC&lt;br /&gt;
| date = 2025/09/18&lt;br /&gt;
| message = Arc update complete&lt;br /&gt;
&lt;br /&gt;
The update to arc is complete.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU partition changes on ARC&lt;br /&gt;
| date = 2025/09/23&lt;br /&gt;
| message = gpu-v100 has 6 more nodes. You can view more details about the node specs by running the arc.hardware script on the login node. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2025/11/4&lt;br /&gt;
| message =&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Legacy compute nodes are being retired&lt;br /&gt;
| date = 2025/11/20&lt;br /&gt;
| message = ARC nodes cn[0513-1096] are being removed from the arc cluster. They will be removed from scheduling and removed from the cluster over the next while. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Navbox ARC}}&lt;br /&gt;
[[Category:ARC]]&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=3944</id>
		<title>ARC Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=3944"/>
		<updated>2025-11-04T16:43:30Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{ARC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ January System Updates&lt;br /&gt;
| date = 2023/01/01&lt;br /&gt;
| message =&lt;br /&gt;
Beginning January 16, 2023, the ARC cluster will undergo operating system updates. We shall do our utmost to minimize disruption and allow ongoing jobs to be completed. New jobs may be temporarily held from scheduling.&lt;br /&gt;
&lt;br /&gt;
The ARC login node will reboot on the morning of January 16. Please save your work and log out if possible.&lt;br /&gt;
&lt;br /&gt;
The upgrade is planned to be fully complete by January 20.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = System Updates Completed&lt;br /&gt;
| date = 2023/01/24&lt;br /&gt;
| message =&lt;br /&gt;
The upgrade has been completed. The following has been changed:&lt;br /&gt;
* OS Updated to Rocky Linux 8.7&lt;br /&gt;
* Slurm updated to 22.05.7&lt;br /&gt;
* Apptainer replaces Singularity&lt;br /&gt;
* Each job will have its own /tmp, /dev/shm, /run/user/$uid mounted&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/02/28&lt;br /&gt;
| message =&lt;br /&gt;
We are currently investigating a filesystem issue that is causing filesystem slowdowns across ARC.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/03/1&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns across ARC. Some jobs on ARC have been paused to help us find the root cause of the slowdowns.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ARC Login node reboot&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
The ARC login node will be rebooted this afternoon for an emergency maintenance. This downtime is needed to help mitigate the filesystem slowdowns experienced on the login node.  Jobs will continue running and scheduling during this time.&lt;br /&gt;
&lt;br /&gt;
All logins to the ARC login node will be terminated at 3:00PM and will remain unavailable until 4:00PM.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Filesystem Issues&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns on specific nodes in our MSRDC location.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues Resolved&lt;br /&gt;
| date = 2023/03/10&lt;br /&gt;
| message =&lt;br /&gt;
We have upgraded the filesystem routers in our MSRDC location to address the performance issues.&lt;br /&gt;
&lt;br /&gt;
Please let us know if you experience any issues with the filesystem performance.&lt;br /&gt;
&lt;br /&gt;
Thank-you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/05/01&lt;br /&gt;
| message =&lt;br /&gt;
On May 1, 2023, the ARC Open OnDemand node will be rebooted between 5PM and 6PM. Expected downtime will be approximately 15 minutes.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Apptainer (Singularity) on ARC Login Node&lt;br /&gt;
| date = 2023/06/22&lt;br /&gt;
| message =&lt;br /&gt;
Apptainer (Singularity) containers may experience an error when&lt;br /&gt;
running on the Arc login node. If apptainer complains that a system&lt;br /&gt;
administrator needs to enable user namespaces, simply run your&lt;br /&gt;
containers inside a job.&lt;br /&gt;
&lt;br /&gt;
This is a temporary measure due to security vulnerability that will be&lt;br /&gt;
patched soon.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Lattice, Single, cpu2013 partition changes&lt;br /&gt;
| date = 2023/07/13&lt;br /&gt;
| message =&lt;br /&gt;
The Lattice and Single, and cpu2013 have all been decomissioned.  The Single&lt;br /&gt;
partition will be replaced by the nodes formerly in the cpu2013 partition but&lt;br /&gt;
will be called single.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/10/17&lt;br /&gt;
| message =&lt;br /&gt;
Open OnDemand will be rebooted on October 17, 2023 for an update. It will be down for up to 30 minutes.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Storage Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2023/10/23&lt;br /&gt;
| message =&lt;br /&gt;
We will be performing storage upgrades on the MARC/ARC cluster on &lt;br /&gt;
November 16 and 17, 2023. To facilitate this, we will be throttling &lt;br /&gt;
down the number of jobs on both clusters while the upgrades are &lt;br /&gt;
performed&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/05/3&lt;br /&gt;
| message =&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Power Interruption&lt;br /&gt;
| date = 2024/05/07&lt;br /&gt;
| message = Arc Experienced an brief power outage around 11AM May 7, 2024.&lt;br /&gt;
Most compute nodes have or are rebooting.  Most jobs running at this time &lt;br /&gt;
were lost. Arc administrators are actively working on restarting compute &lt;br /&gt;
nodes. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation&lt;br /&gt;
| date = 2024/06/03&lt;br /&gt;
| message = Job submissions targeted to the  GPU a100 partition will be &lt;br /&gt;
affected by a temporary reservation on the nodes to accommodate the RCS&lt;br /&gt;
summer school class taking place on 2024/Jun/10. Reservation will end &lt;br /&gt;
shortly after. Please submit your jobs normally and the scheduler will &lt;br /&gt;
start them as soon as the nodes are available. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation Removed&lt;br /&gt;
| date = 2024/06/11&lt;br /&gt;
| message = GPU a100 Nodes in ARC have been returned to normal scheduling. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Notice of Upcoming Partial Outage&lt;br /&gt;
| date = 2024/08/23&lt;br /&gt;
| message = Several compute nodes from the ARC cluster will be unavailable &lt;br /&gt;
between Sept 23 to Sept 27 inclusive (subject to change). All compute nodes &lt;br /&gt;
in cpu2019, cpu2021/2, gpu-v100 most nodes from bigmem and gpu-a100 will be &lt;br /&gt;
affected. These nodes will return to service as soon as the work is complete.  &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update I&lt;br /&gt;
| date = 2024/09/25&lt;br /&gt;
| message = Due to hardware issues that is blocking our original maintenance window, most compute nodes that were taken offline on Monday has been brought back online today. An additional partial outage will occur again starting next Tuesday for the same nodes.&lt;br /&gt;
&lt;br /&gt;
On Tuesday, October 1, 2024, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until Friday October 4, 2024.&lt;br /&gt;
&lt;br /&gt;
We apologise for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update II&lt;br /&gt;
| date = 2024/10/04&lt;br /&gt;
| message = The maintenance window will be extended until at least Monday, October 7, 2024 due to a power distribution issue in our renovated data centre.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Monday, October 7, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update III&lt;br /&gt;
| date = 2024/10/07&lt;br /&gt;
| message = Due to technical issues beyond our control the maintenance window will be extended until at least Tuesday, October 15, 2024.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Tuesday, October 15, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Normal Scheduling has resumed. &lt;br /&gt;
| date = 2024/10/08&lt;br /&gt;
| message = The ARC cluster has been successfully brought online and nodes are running jobs normally. We apologize for the extended downtime. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance &lt;br /&gt;
| date = 2024/12/11&lt;br /&gt;
| message = The ARC login node will be rebooted on Tuesday December 17 for scheduled maintenance. It will be down for a few minutes and return shortly. Job scheduling and jobs running on the cluster will not be affected. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/07&lt;br /&gt;
| message = The ARC cluster will be rebooted for OS updates on Monday January 20, 2025. Please make sure to save your work and log out before the reboot happens. Scheduling will be paused until the cluster is back, but queued jobs will remain in the queue and nodes will start scheduling when the cluster is ready. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/15&lt;br /&gt;
| message = The ARC cluster will be down for maintenance and upgrades starting 9AM Monday, January 20, 2025 through Wednesday, January 22, 2025. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For the duration of the upgrade window:&lt;br /&gt;
* Scheduling will be paused and new jobs will be queued. Any queued jobs will start scheduling only after the upgrade is complete.&lt;br /&gt;
* Access to files via the login node and arc-dtn will generally be available but intermittent. File transfers on the DTN node, including Globus file transfers, may be interrupted during this window.&lt;br /&gt;
&lt;br /&gt;
Please make sure to save your work prior to this outage window to avoid any loss of work.&lt;br /&gt;
&lt;br /&gt;
During this time the following changes will happen:&lt;br /&gt;
&lt;br /&gt;
1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:&lt;br /&gt;
* cpu2023 (temporary)&lt;br /&gt;
* Parallel&lt;br /&gt;
* Theia/Synergy/cpu2017-bf05&lt;br /&gt;
* Single&lt;br /&gt;
Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.&lt;br /&gt;
&lt;br /&gt;
2. A component of the NetApp filer will be replaced. Access to /bulk will be unavailable on Wednesday, January 22, 2025.&lt;br /&gt;
&lt;br /&gt;
3. The compute node operating system will be updated to Rocky Linux 8.10.&lt;br /&gt;
&lt;br /&gt;
4. The Slurm scheduling system will be upgraded.&lt;br /&gt;
&lt;br /&gt;
5. The Open OnDemand web portal will be upgraded.&lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.calgary.ca with any issues or concerns.&lt;br /&gt;
&lt;br /&gt;
⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️&lt;br /&gt;
Update Jan 18, 2025&lt;br /&gt;
&lt;br /&gt;
Around 10AM Arc experienced an electrical power brownout.  Some percentage (how many is unknown at this time) of the nodes lost electrical power during this time causing a loss of a number of running jobs.  &lt;br /&gt;
&lt;br /&gt;
Sorry for the inconvenience.  &lt;br /&gt;
&lt;br /&gt;
Since Arc is shutting down for maintenance on Monday Jan 20, replacement jobs will likely not start unless they request a timelimit less than the time until 8AM Monday.  &lt;br /&gt;
⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Maintenance Complete&lt;br /&gt;
| date = 2025/01/22&lt;br /&gt;
| message = The ARC cluster upgrade is complete&lt;br /&gt;
&lt;br /&gt;
During this time the following changes happened:&lt;br /&gt;
&lt;br /&gt;
1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:&lt;br /&gt;
* cpu2023 (temporary)&lt;br /&gt;
* Parallel&lt;br /&gt;
* Theia/Synergy/cpu2017-bf05&lt;br /&gt;
* Single&lt;br /&gt;
Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.&lt;br /&gt;
&lt;br /&gt;
2. A component of the NetApp filer was replaced successfully.&lt;br /&gt;
&lt;br /&gt;
3. The compute node operating was updated to Rocky Linux 8.10.&lt;br /&gt;
&lt;br /&gt;
4. The Slurm scheduling system was upgraded.&lt;br /&gt;
&lt;br /&gt;
5. The Open OnDemand web portal was upgraded.&lt;br /&gt;
&lt;br /&gt;
6. The Parallel partition was renamed to Legacy to show the lack of an interconnect for parallel MPI work and was restricted to maximum 4 node jobs.&lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.calgary.ca with any issues or concerns.&lt;br /&gt;
&lt;br /&gt;
Jan 23, 9:08AM&lt;br /&gt;
&lt;br /&gt;
Remount complete.  arc is back in full service.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Module Command Upgrade&lt;br /&gt;
| date = 2025/02/03&lt;br /&gt;
| message = Upgrade of the module command&lt;br /&gt;
&lt;br /&gt;
On Tuesday, February 11, 2025 the module command will be upgraded to a new verson on Arc. This should result in new capabilities and a slightly different visual experience when using the module command.  Loading modules is not expected to change.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Module Command Upgraded&lt;br /&gt;
| date = 2025/02/12&lt;br /&gt;
| message = The module command was upgraded&lt;br /&gt;
&lt;br /&gt;
On Tuesday, February 12, 2025 the module command was upgraded to a new verson on Arc. This should result in new capabilities and a slightly different visual experience when using the module command.  Loading modules should not have changed.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address down&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca Unavailable&lt;br /&gt;
&lt;br /&gt;
Please be informed that our support email address (support@hpc.ucalgary.ca) for RCS is currently not working. We are working to bring it back as soon as possible. Please keep an eye on this space for updates. The clusters are working normally, but support will not receive your messages at this time. We will begin responding as soon as we can get it back. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address functional&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca is back&lt;br /&gt;
&lt;br /&gt;
support@hpc.ucalgary.ca has been repaired and RCS can be contacted there. If you had reached out for assistance in recent days without response please follow up as we may not have received your initial email. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Interactive Job Timelimit will be Enforced&lt;br /&gt;
| date = 2025/04/11&lt;br /&gt;
| message = In order to improve the scheduling and job throughput efficiency of ARC, interactive jobs will be limited to a maximum of 5 hours of runtime.  Interactive jobs that are submitted with a timelimit over 5 hours will be rejected at submission time.  This change will be made on Monday, April 28, 2025.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Interactive Job Timelimit Is Now Enforced&lt;br /&gt;
| date = 2025/04/28&lt;br /&gt;
| message = In order to improve the scheduling and job throughput efficiency of ARC, interactive jobs are now limited to a maximum of 5 hours of runtime.  Interactive jobs that are submitted with a timelimit over 5 hours will be rejected at submission time.&lt;br /&gt;
&lt;br /&gt;
Apr 29, 2025&lt;br /&gt;
To increase the security posture of the Arc cluster administrators will be installing Trend Micro on cluster login nodes over the week starting Apr 30.  Please report any inconsistencies to support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Bulk Filesystem Emergency Maintenance&lt;br /&gt;
| date = 2025/07/29&lt;br /&gt;
| message = The filer that provides the /bulk filesystem will be down for emergency repairs at 12 Noon on Thursday July 31.  No access to files on /bulk will be possible for the duration of the multi-hour outage.  Any jobs running that access /bulk will start and then pause when access to /bulk is attempted.  Jobs should continue once service is restored.  Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Bulk Filesystem Maintenance Complete&lt;br /&gt;
| date = 2025/08/01&lt;br /&gt;
| message = Maintenance on /bulk was completed successfully and all filesystems are back in service on Arc.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU partition changes on ARC&lt;br /&gt;
| date = 2025/08/15&lt;br /&gt;
| message = New gpu-h100 and gpu-l40 partitions are available for general scheduling with new gpu hardware. gpu-v100 also has 6 fewer nodes. You can view more details about the node specs by running the arc.hardware script on the login node. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Update and Bulk outage on ARC&lt;br /&gt;
| date = 2025/09/15&lt;br /&gt;
| message = ⚠️ Arc Update September 15-19&lt;br /&gt;
&lt;br /&gt;
An update of the Arc cluster will begin on Sep 15.  This will result in fewer resources while the compute nodes are restarted.  The login node and Scheduler will be restarted on Sep 17.&lt;br /&gt;
&lt;br /&gt;
⚠️ Bulk Filesystem Maintenance September 17&lt;br /&gt;
The filer that provides the /bulk filesystem will be down for emergency repairs at 9 AM on Wednesday Sep 17th. No access to files on /bulk will be possible for the duration of the multi-hour outage. Any jobs running that access /bulk will start and then pause when access to /bulk is attempted. Jobs should continue once service is restored.  We apologize for any inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Update and Bulk outage on ARC&lt;br /&gt;
| date = 2025/09/18&lt;br /&gt;
| message = Arc update complete&lt;br /&gt;
&lt;br /&gt;
The update to arc is complete.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU partition changes on ARC&lt;br /&gt;
| date = 2025/09/23&lt;br /&gt;
| message = gpu-v100 has 6 more nodes. You can view more details about the node specs by running the arc.hardware script on the login node. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2025/11/4&lt;br /&gt;
| message =&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Navbox ARC}}&lt;br /&gt;
[[Category:ARC]]&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=3908</id>
		<title>ARC Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=3908"/>
		<updated>2025-09-23T16:01:57Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{ARC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ January System Updates&lt;br /&gt;
| date = 2023/01/01&lt;br /&gt;
| message =&lt;br /&gt;
Beginning January 16, 2023, the ARC cluster will undergo operating system updates. We shall do our utmost to minimize disruption and allow ongoing jobs to be completed. New jobs may be temporarily held from scheduling.&lt;br /&gt;
&lt;br /&gt;
The ARC login node will reboot on the morning of January 16. Please save your work and log out if possible.&lt;br /&gt;
&lt;br /&gt;
The upgrade is planned to be fully complete by January 20.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = System Updates Completed&lt;br /&gt;
| date = 2023/01/24&lt;br /&gt;
| message =&lt;br /&gt;
The upgrade has been completed. The following has been changed:&lt;br /&gt;
* OS Updated to Rocky Linux 8.7&lt;br /&gt;
* Slurm updated to 22.05.7&lt;br /&gt;
* Apptainer replaces Singularity&lt;br /&gt;
* Each job will have its own /tmp, /dev/shm, /run/user/$uid mounted&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/02/28&lt;br /&gt;
| message =&lt;br /&gt;
We are currently investigating a filesystem issue that is causing filesystem slowdowns across ARC.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/03/1&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns across ARC. Some jobs on ARC have been paused to help us find the root cause of the slowdowns.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ARC Login node reboot&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
The ARC login node will be rebooted this afternoon for an emergency maintenance. This downtime is needed to help mitigate the filesystem slowdowns experienced on the login node.  Jobs will continue running and scheduling during this time.&lt;br /&gt;
&lt;br /&gt;
All logins to the ARC login node will be terminated at 3:00PM and will remain unavailable until 4:00PM.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Filesystem Issues&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns on specific nodes in our MSRDC location.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues Resolved&lt;br /&gt;
| date = 2023/03/10&lt;br /&gt;
| message =&lt;br /&gt;
We have upgraded the filesystem routers in our MSRDC location to address the performance issues.&lt;br /&gt;
&lt;br /&gt;
Please let us know if you experience any issues with the filesystem performance.&lt;br /&gt;
&lt;br /&gt;
Thank-you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/05/01&lt;br /&gt;
| message =&lt;br /&gt;
On May 1, 2023, the ARC Open OnDemand node will be rebooted between 5PM and 6PM. Expected downtime will be approximately 15 minutes.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Apptainer (Singularity) on ARC Login Node&lt;br /&gt;
| date = 2023/06/22&lt;br /&gt;
| message =&lt;br /&gt;
Apptainer (Singularity) containers may experience an error when&lt;br /&gt;
running on the Arc login node. If apptainer complains that a system&lt;br /&gt;
administrator needs to enable user namespaces, simply run your&lt;br /&gt;
containers inside a job.&lt;br /&gt;
&lt;br /&gt;
This is a temporary measure due to security vulnerability that will be&lt;br /&gt;
patched soon.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Lattice, Single, cpu2013 partition changes&lt;br /&gt;
| date = 2023/07/13&lt;br /&gt;
| message =&lt;br /&gt;
The Lattice and Single, and cpu2013 have all been decomissioned.  The Single&lt;br /&gt;
partition will be replaced by the nodes formerly in the cpu2013 partition but&lt;br /&gt;
will be called single.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/10/17&lt;br /&gt;
| message =&lt;br /&gt;
Open OnDemand will be rebooted on October 17, 2023 for an update. It will be down for up to 30 minutes.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Storage Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2023/10/23&lt;br /&gt;
| message =&lt;br /&gt;
We will be performing storage upgrades on the MARC/ARC cluster on &lt;br /&gt;
November 16 and 17, 2023. To facilitate this, we will be throttling &lt;br /&gt;
down the number of jobs on both clusters while the upgrades are &lt;br /&gt;
performed&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/05/3&lt;br /&gt;
| message =&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Power Interruption&lt;br /&gt;
| date = 2024/05/07&lt;br /&gt;
| message = Arc Experienced an brief power outage around 11AM May 7, 2024.&lt;br /&gt;
Most compute nodes have or are rebooting.  Most jobs running at this time &lt;br /&gt;
were lost. Arc administrators are actively working on restarting compute &lt;br /&gt;
nodes. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation&lt;br /&gt;
| date = 2024/06/03&lt;br /&gt;
| message = Job submissions targeted to the  GPU a100 partition will be &lt;br /&gt;
affected by a temporary reservation on the nodes to accommodate the RCS&lt;br /&gt;
summer school class taking place on 2024/Jun/10. Reservation will end &lt;br /&gt;
shortly after. Please submit your jobs normally and the scheduler will &lt;br /&gt;
start them as soon as the nodes are available. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation Removed&lt;br /&gt;
| date = 2024/06/11&lt;br /&gt;
| message = GPU a100 Nodes in ARC have been returned to normal scheduling. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Notice of Upcoming Partial Outage&lt;br /&gt;
| date = 2024/08/23&lt;br /&gt;
| message = Several compute nodes from the ARC cluster will be unavailable &lt;br /&gt;
between Sept 23 to Sept 27 inclusive (subject to change). All compute nodes &lt;br /&gt;
in cpu2019, cpu2021/2, gpu-v100 most nodes from bigmem and gpu-a100 will be &lt;br /&gt;
affected. These nodes will return to service as soon as the work is complete.  &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update I&lt;br /&gt;
| date = 2024/09/25&lt;br /&gt;
| message = Due to hardware issues that is blocking our original maintenance window, most compute nodes that were taken offline on Monday has been brought back online today. An additional partial outage will occur again starting next Tuesday for the same nodes.&lt;br /&gt;
&lt;br /&gt;
On Tuesday, October 1, 2024, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until Friday October 4, 2024.&lt;br /&gt;
&lt;br /&gt;
We apologise for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update II&lt;br /&gt;
| date = 2024/10/04&lt;br /&gt;
| message = The maintenance window will be extended until at least Monday, October 7, 2024 due to a power distribution issue in our renovated data centre.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Monday, October 7, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update III&lt;br /&gt;
| date = 2024/10/07&lt;br /&gt;
| message = Due to technical issues beyond our control the maintenance window will be extended until at least Tuesday, October 15, 2024.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Tuesday, October 15, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Normal Scheduling has resumed. &lt;br /&gt;
| date = 2024/10/08&lt;br /&gt;
| message = The ARC cluster has been successfully brought online and nodes are running jobs normally. We apologize for the extended downtime. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance &lt;br /&gt;
| date = 2024/12/11&lt;br /&gt;
| message = The ARC login node will be rebooted on Tuesday December 17 for scheduled maintenance. It will be down for a few minutes and return shortly. Job scheduling and jobs running on the cluster will not be affected. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/07&lt;br /&gt;
| message = The ARC cluster will be rebooted for OS updates on Monday January 20, 2025. Please make sure to save your work and log out before the reboot happens. Scheduling will be paused until the cluster is back, but queued jobs will remain in the queue and nodes will start scheduling when the cluster is ready. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/15&lt;br /&gt;
| message = The ARC cluster will be down for maintenance and upgrades starting 9AM Monday, January 20, 2025 through Wednesday, January 22, 2025. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For the duration of the upgrade window:&lt;br /&gt;
* Scheduling will be paused and new jobs will be queued. Any queued jobs will start scheduling only after the upgrade is complete.&lt;br /&gt;
* Access to files via the login node and arc-dtn will generally be available but intermittent. File transfers on the DTN node, including Globus file transfers, may be interrupted during this window.&lt;br /&gt;
&lt;br /&gt;
Please make sure to save your work prior to this outage window to avoid any loss of work.&lt;br /&gt;
&lt;br /&gt;
During this time the following changes will happen:&lt;br /&gt;
&lt;br /&gt;
1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:&lt;br /&gt;
* cpu2023 (temporary)&lt;br /&gt;
* Parallel&lt;br /&gt;
* Theia/Synergy/cpu2017-bf05&lt;br /&gt;
* Single&lt;br /&gt;
Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.&lt;br /&gt;
&lt;br /&gt;
2. A component of the NetApp filer will be replaced. Access to /bulk will be unavailable on Wednesday, January 22, 2025.&lt;br /&gt;
&lt;br /&gt;
3. The compute node operating system will be updated to Rocky Linux 8.10.&lt;br /&gt;
&lt;br /&gt;
4. The Slurm scheduling system will be upgraded.&lt;br /&gt;
&lt;br /&gt;
5. The Open OnDemand web portal will be upgraded.&lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.calgary.ca with any issues or concerns.&lt;br /&gt;
&lt;br /&gt;
⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️&lt;br /&gt;
Update Jan 18, 2025&lt;br /&gt;
&lt;br /&gt;
Around 10AM Arc experienced an electrical power brownout.  Some percentage (how many is unknown at this time) of the nodes lost electrical power during this time causing a loss of a number of running jobs.  &lt;br /&gt;
&lt;br /&gt;
Sorry for the inconvenience.  &lt;br /&gt;
&lt;br /&gt;
Since Arc is shutting down for maintenance on Monday Jan 20, replacement jobs will likely not start unless they request a timelimit less than the time until 8AM Monday.  &lt;br /&gt;
⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Maintenance Complete&lt;br /&gt;
| date = 2025/01/22&lt;br /&gt;
| message = The ARC cluster upgrade is complete&lt;br /&gt;
&lt;br /&gt;
During this time the following changes happened:&lt;br /&gt;
&lt;br /&gt;
1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:&lt;br /&gt;
* cpu2023 (temporary)&lt;br /&gt;
* Parallel&lt;br /&gt;
* Theia/Synergy/cpu2017-bf05&lt;br /&gt;
* Single&lt;br /&gt;
Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.&lt;br /&gt;
&lt;br /&gt;
2. A component of the NetApp filer was replaced successfully.&lt;br /&gt;
&lt;br /&gt;
3. The compute node operating was updated to Rocky Linux 8.10.&lt;br /&gt;
&lt;br /&gt;
4. The Slurm scheduling system was upgraded.&lt;br /&gt;
&lt;br /&gt;
5. The Open OnDemand web portal was upgraded.&lt;br /&gt;
&lt;br /&gt;
6. The Parallel partition was renamed to Legacy to show the lack of an interconnect for parallel MPI work and was restricted to maximum 4 node jobs.&lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.calgary.ca with any issues or concerns.&lt;br /&gt;
&lt;br /&gt;
Jan 23, 9:08AM&lt;br /&gt;
&lt;br /&gt;
Remount complete.  arc is back in full service.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Module Command Upgrade&lt;br /&gt;
| date = 2025/02/03&lt;br /&gt;
| message = Upgrade of the module command&lt;br /&gt;
&lt;br /&gt;
On Tuesday, February 11, 2025 the module command will be upgraded to a new verson on Arc. This should result in new capabilities and a slightly different visual experience when using the module command.  Loading modules is not expected to change.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Module Command Upgraded&lt;br /&gt;
| date = 2025/02/12&lt;br /&gt;
| message = The module command was upgraded&lt;br /&gt;
&lt;br /&gt;
On Tuesday, February 12, 2025 the module command was upgraded to a new verson on Arc. This should result in new capabilities and a slightly different visual experience when using the module command.  Loading modules should not have changed.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address down&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca Unavailable&lt;br /&gt;
&lt;br /&gt;
Please be informed that our support email address (support@hpc.ucalgary.ca) for RCS is currently not working. We are working to bring it back as soon as possible. Please keep an eye on this space for updates. The clusters are working normally, but support will not receive your messages at this time. We will begin responding as soon as we can get it back. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address functional&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca is back&lt;br /&gt;
&lt;br /&gt;
support@hpc.ucalgary.ca has been repaired and RCS can be contacted there. If you had reached out for assistance in recent days without response please follow up as we may not have received your initial email. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Interactive Job Timelimit will be Enforced&lt;br /&gt;
| date = 2025/04/11&lt;br /&gt;
| message = In order to improve the scheduling and job throughput efficiency of ARC, interactive jobs will be limited to a maximum of 5 hours of runtime.  Interactive jobs that are submitted with a timelimit over 5 hours will be rejected at submission time.  This change will be made on Monday, April 28, 2025.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Interactive Job Timelimit Is Now Enforced&lt;br /&gt;
| date = 2025/04/28&lt;br /&gt;
| message = In order to improve the scheduling and job throughput efficiency of ARC, interactive jobs are now limited to a maximum of 5 hours of runtime.  Interactive jobs that are submitted with a timelimit over 5 hours will be rejected at submission time.&lt;br /&gt;
&lt;br /&gt;
Apr 29, 2025&lt;br /&gt;
To increase the security posture of the Arc cluster administrators will be installing Trend Micro on cluster login nodes over the week starting Apr 30.  Please report any inconsistencies to support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Bulk Filesystem Emergency Maintenance&lt;br /&gt;
| date = 2025/07/29&lt;br /&gt;
| message = The filer that provides the /bulk filesystem will be down for emergency repairs at 12 Noon on Thursday July 31.  No access to files on /bulk will be possible for the duration of the multi-hour outage.  Any jobs running that access /bulk will start and then pause when access to /bulk is attempted.  Jobs should continue once service is restored.  Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Bulk Filesystem Maintenance Complete&lt;br /&gt;
| date = 2025/08/01&lt;br /&gt;
| message = Maintenance on /bulk was completed successfully and all filesystems are back in service on Arc.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU partition changes on ARC&lt;br /&gt;
| date = 2025/08/15&lt;br /&gt;
| message = New gpu-h100 and gpu-l40 partitions are available for general scheduling with new gpu hardware. gpu-v100 also has 6 fewer nodes. You can view more details about the node specs by running the arc.hardware script on the login node. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Update and Bulk outage on ARC&lt;br /&gt;
| date = 2025/09/15&lt;br /&gt;
| message = ⚠️ Arc Update September 15-19&lt;br /&gt;
&lt;br /&gt;
An update of the Arc cluster will begin on Sep 15.  This will result in fewer resources while the compute nodes are restarted.  The login node and Scheduler will be restarted on Sep 17.&lt;br /&gt;
&lt;br /&gt;
⚠️ Bulk Filesystem Maintenance September 17&lt;br /&gt;
The filer that provides the /bulk filesystem will be down for emergency repairs at 9 AM on Wednesday Sep 17th. No access to files on /bulk will be possible for the duration of the multi-hour outage. Any jobs running that access /bulk will start and then pause when access to /bulk is attempted. Jobs should continue once service is restored.  We apologize for any inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Update and Bulk outage on ARC&lt;br /&gt;
| date = 2025/09/18&lt;br /&gt;
| message = Arc update complete&lt;br /&gt;
&lt;br /&gt;
The update to arc is complete.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU partition changes on ARC&lt;br /&gt;
| date = 2025/09/23&lt;br /&gt;
| message = gpu-v100 has 6 more nodes. You can view more details about the node specs by running the arc.hardware script on the login node. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Navbox ARC}}&lt;br /&gt;
[[Category:ARC]]&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=3823</id>
		<title>ARC Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=3823"/>
		<updated>2025-08-15T17:38:57Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{ARC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ January System Updates&lt;br /&gt;
| date = 2023/01/01&lt;br /&gt;
| message =&lt;br /&gt;
Beginning January 16, 2023, the ARC cluster will undergo operating system updates. We shall do our utmost to minimize disruption and allow ongoing jobs to be completed. New jobs may be temporarily held from scheduling.&lt;br /&gt;
&lt;br /&gt;
The ARC login node will reboot on the morning of January 16. Please save your work and log out if possible.&lt;br /&gt;
&lt;br /&gt;
The upgrade is planned to be fully complete by January 20.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = System Updates Completed&lt;br /&gt;
| date = 2023/01/24&lt;br /&gt;
| message =&lt;br /&gt;
The upgrade has been completed. The following has been changed:&lt;br /&gt;
* OS Updated to Rocky Linux 8.7&lt;br /&gt;
* Slurm updated to 22.05.7&lt;br /&gt;
* Apptainer replaces Singularity&lt;br /&gt;
* Each job will have its own /tmp, /dev/shm, /run/user/$uid mounted&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/02/28&lt;br /&gt;
| message =&lt;br /&gt;
We are currently investigating a filesystem issue that is causing filesystem slowdowns across ARC.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/03/1&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns across ARC. Some jobs on ARC have been paused to help us find the root cause of the slowdowns.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ARC Login node reboot&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
The ARC login node will be rebooted this afternoon for an emergency maintenance. This downtime is needed to help mitigate the filesystem slowdowns experienced on the login node.  Jobs will continue running and scheduling during this time.&lt;br /&gt;
&lt;br /&gt;
All logins to the ARC login node will be terminated at 3:00PM and will remain unavailable until 4:00PM.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Filesystem Issues&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns on specific nodes in our MSRDC location.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues Resolved&lt;br /&gt;
| date = 2023/03/10&lt;br /&gt;
| message =&lt;br /&gt;
We have upgraded the filesystem routers in our MSRDC location to address the performance issues.&lt;br /&gt;
&lt;br /&gt;
Please let us know if you experience any issues with the filesystem performance.&lt;br /&gt;
&lt;br /&gt;
Thank-you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/05/01&lt;br /&gt;
| message =&lt;br /&gt;
On May 1, 2023, the ARC Open OnDemand node will be rebooted between 5PM and 6PM. Expected downtime will be approximately 15 minutes.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Apptainer (Singularity) on ARC Login Node&lt;br /&gt;
| date = 2023/06/22&lt;br /&gt;
| message =&lt;br /&gt;
Apptainer (Singularity) containers may experience an error when&lt;br /&gt;
running on the Arc login node. If apptainer complains that a system&lt;br /&gt;
administrator needs to enable user namespaces, simply run your&lt;br /&gt;
containers inside a job.&lt;br /&gt;
&lt;br /&gt;
This is a temporary measure due to security vulnerability that will be&lt;br /&gt;
patched soon.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Lattice, Single, cpu2013 partition changes&lt;br /&gt;
| date = 2023/07/13&lt;br /&gt;
| message =&lt;br /&gt;
The Lattice and Single, and cpu2013 have all been decomissioned.  The Single&lt;br /&gt;
partition will be replaced by the nodes formerly in the cpu2013 partition but&lt;br /&gt;
will be called single.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/10/17&lt;br /&gt;
| message =&lt;br /&gt;
Open OnDemand will be rebooted on October 17, 2023 for an update. It will be down for up to 30 minutes.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Storage Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2023/10/23&lt;br /&gt;
| message =&lt;br /&gt;
We will be performing storage upgrades on the MARC/ARC cluster on &lt;br /&gt;
November 16 and 17, 2023. To facilitate this, we will be throttling &lt;br /&gt;
down the number of jobs on both clusters while the upgrades are &lt;br /&gt;
performed&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/05/3&lt;br /&gt;
| message =&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Power Interruption&lt;br /&gt;
| date = 2024/05/07&lt;br /&gt;
| message = Arc Experienced an brief power outage around 11AM May 7, 2024.&lt;br /&gt;
Most compute nodes have or are rebooting.  Most jobs running at this time &lt;br /&gt;
were lost. Arc administrators are actively working on restarting compute &lt;br /&gt;
nodes. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation&lt;br /&gt;
| date = 2024/06/03&lt;br /&gt;
| message = Job submissions targeted to the  GPU a100 partition will be &lt;br /&gt;
affected by a temporary reservation on the nodes to accommodate the RCS&lt;br /&gt;
summer school class taking place on 2024/Jun/10. Reservation will end &lt;br /&gt;
shortly after. Please submit your jobs normally and the scheduler will &lt;br /&gt;
start them as soon as the nodes are available. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation Removed&lt;br /&gt;
| date = 2024/06/11&lt;br /&gt;
| message = GPU a100 Nodes in ARC have been returned to normal scheduling. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Notice of Upcoming Partial Outage&lt;br /&gt;
| date = 2024/08/23&lt;br /&gt;
| message = Several compute nodes from the ARC cluster will be unavailable &lt;br /&gt;
between Sept 23 to Sept 27 inclusive (subject to change). All compute nodes &lt;br /&gt;
in cpu2019, cpu2021/2, gpu-v100 most nodes from bigmem and gpu-a100 will be &lt;br /&gt;
affected. These nodes will return to service as soon as the work is complete.  &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update I&lt;br /&gt;
| date = 2024/09/25&lt;br /&gt;
| message = Due to hardware issues that is blocking our original maintenance window, most compute nodes that were taken offline on Monday has been brought back online today. An additional partial outage will occur again starting next Tuesday for the same nodes.&lt;br /&gt;
&lt;br /&gt;
On Tuesday, October 1, 2024, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until Friday October 4, 2024.&lt;br /&gt;
&lt;br /&gt;
We apologise for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update II&lt;br /&gt;
| date = 2024/10/04&lt;br /&gt;
| message = The maintenance window will be extended until at least Monday, October 7, 2024 due to a power distribution issue in our renovated data centre.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Monday, October 7, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update III&lt;br /&gt;
| date = 2024/10/07&lt;br /&gt;
| message = Due to technical issues beyond our control the maintenance window will be extended until at least Tuesday, October 15, 2024.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Tuesday, October 15, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Normal Scheduling has resumed. &lt;br /&gt;
| date = 2024/10/08&lt;br /&gt;
| message = The ARC cluster has been successfully brought online and nodes are running jobs normally. We apologize for the extended downtime. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance &lt;br /&gt;
| date = 2024/12/11&lt;br /&gt;
| message = The ARC login node will be rebooted on Tuesday December 17 for scheduled maintenance. It will be down for a few minutes and return shortly. Job scheduling and jobs running on the cluster will not be affected. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/07&lt;br /&gt;
| message = The ARC cluster will be rebooted for OS updates on Monday January 20, 2025. Please make sure to save your work and log out before the reboot happens. Scheduling will be paused until the cluster is back, but queued jobs will remain in the queue and nodes will start scheduling when the cluster is ready. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/15&lt;br /&gt;
| message = The ARC cluster will be down for maintenance and upgrades starting 9AM Monday, January 20, 2025 through Wednesday, January 22, 2025. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For the duration of the upgrade window:&lt;br /&gt;
* Scheduling will be paused and new jobs will be queued. Any queued jobs will start scheduling only after the upgrade is complete.&lt;br /&gt;
* Access to files via the login node and arc-dtn will generally be available but intermittent. File transfers on the DTN node, including Globus file transfers, may be interrupted during this window.&lt;br /&gt;
&lt;br /&gt;
Please make sure to save your work prior to this outage window to avoid any loss of work.&lt;br /&gt;
&lt;br /&gt;
During this time the following changes will happen:&lt;br /&gt;
&lt;br /&gt;
1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:&lt;br /&gt;
* cpu2023 (temporary)&lt;br /&gt;
* Parallel&lt;br /&gt;
* Theia/Synergy/cpu2017-bf05&lt;br /&gt;
* Single&lt;br /&gt;
Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.&lt;br /&gt;
&lt;br /&gt;
2. A component of the NetApp filer will be replaced. Access to /bulk will be unavailable on Wednesday, January 22, 2025.&lt;br /&gt;
&lt;br /&gt;
3. The compute node operating system will be updated to Rocky Linux 8.10.&lt;br /&gt;
&lt;br /&gt;
4. The Slurm scheduling system will be upgraded.&lt;br /&gt;
&lt;br /&gt;
5. The Open OnDemand web portal will be upgraded.&lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.calgary.ca with any issues or concerns.&lt;br /&gt;
&lt;br /&gt;
⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️&lt;br /&gt;
Update Jan 18, 2025&lt;br /&gt;
&lt;br /&gt;
Around 10AM Arc experienced an electrical power brownout.  Some percentage (how many is unknown at this time) of the nodes lost electrical power during this time causing a loss of a number of running jobs.  &lt;br /&gt;
&lt;br /&gt;
Sorry for the inconvenience.  &lt;br /&gt;
&lt;br /&gt;
Since Arc is shutting down for maintenance on Monday Jan 20, replacement jobs will likely not start unless they request a timelimit less than the time until 8AM Monday.  &lt;br /&gt;
⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Maintenance Complete&lt;br /&gt;
| date = 2025/01/22&lt;br /&gt;
| message = The ARC cluster upgrade is complete&lt;br /&gt;
&lt;br /&gt;
During this time the following changes happened:&lt;br /&gt;
&lt;br /&gt;
1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:&lt;br /&gt;
* cpu2023 (temporary)&lt;br /&gt;
* Parallel&lt;br /&gt;
* Theia/Synergy/cpu2017-bf05&lt;br /&gt;
* Single&lt;br /&gt;
Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.&lt;br /&gt;
&lt;br /&gt;
2. A component of the NetApp filer was replaced successfully.&lt;br /&gt;
&lt;br /&gt;
3. The compute node operating was updated to Rocky Linux 8.10.&lt;br /&gt;
&lt;br /&gt;
4. The Slurm scheduling system was upgraded.&lt;br /&gt;
&lt;br /&gt;
5. The Open OnDemand web portal was upgraded.&lt;br /&gt;
&lt;br /&gt;
6. The Parallel partition was renamed to Legacy to show the lack of an interconnect for parallel MPI work and was restricted to maximum 4 node jobs.&lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.calgary.ca with any issues or concerns.&lt;br /&gt;
&lt;br /&gt;
Jan 23, 9:08AM&lt;br /&gt;
&lt;br /&gt;
Remount complete.  arc is back in full service.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Module Command Upgrade&lt;br /&gt;
| date = 2025/02/03&lt;br /&gt;
| message = Upgrade of the module command&lt;br /&gt;
&lt;br /&gt;
On Tuesday, February 11, 2025 the module command will be upgraded to a new verson on Arc. This should result in new capabilities and a slightly different visual experience when using the module command.  Loading modules is not expected to change.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Module Command Upgraded&lt;br /&gt;
| date = 2025/02/12&lt;br /&gt;
| message = The module command was upgraded&lt;br /&gt;
&lt;br /&gt;
On Tuesday, February 12, 2025 the module command was upgraded to a new verson on Arc. This should result in new capabilities and a slightly different visual experience when using the module command.  Loading modules should not have changed.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address down&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca Unavailable&lt;br /&gt;
&lt;br /&gt;
Please be informed that our support email address (support@hpc.ucalgary.ca) for RCS is currently not working. We are working to bring it back as soon as possible. Please keep an eye on this space for updates. The clusters are working normally, but support will not receive your messages at this time. We will begin responding as soon as we can get it back. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address functional&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca is back&lt;br /&gt;
&lt;br /&gt;
support@hpc.ucalgary.ca has been repaired and RCS can be contacted there. If you had reached out for assistance in recent days without response please follow up as we may not have received your initial email. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Interactive Job Timelimit will be Enforced&lt;br /&gt;
| date = 2025/04/11&lt;br /&gt;
| message = In order to improve the scheduling and job throughput efficiency of ARC, interactive jobs will be limited to a maximum of 5 hours of runtime.  Interactive jobs that are submitted with a timelimit over 5 hours will be rejected at submission time.  This change will be made on Monday, April 28, 2025.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Interactive Job Timelimit Is Now Enforced&lt;br /&gt;
| date = 2025/04/28&lt;br /&gt;
| message = In order to improve the scheduling and job throughput efficiency of ARC, interactive jobs are now limited to a maximum of 5 hours of runtime.  Interactive jobs that are submitted with a timelimit over 5 hours will be rejected at submission time.&lt;br /&gt;
&lt;br /&gt;
Apr 29, 2025&lt;br /&gt;
To increase the security posture of the Arc cluster administrators will be installing Trend Micro on cluster login nodes over the week starting Apr 30.  Please report any inconsistencies to support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Bulk Filesystem Emergency Maintenance&lt;br /&gt;
| date = 2025/07/29&lt;br /&gt;
| message = The filer that provides the /bulk filesystem will be down for emergency repairs at 12 Noon on Thursday July 31.  No access to files on /bulk will be possible for the duration of the multi-hour outage.  Any jobs running that access /bulk will start and then pause when access to /bulk is attempted.  Jobs should continue once service is restored.  Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Bulk Filesystem Maintenance Complete&lt;br /&gt;
| date = 2025/08/01&lt;br /&gt;
| message = Maintenance on /bulk was completed successfully and all filesystems are back in service on Arc.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU partition changes on ARC&lt;br /&gt;
| date = 2025/08/15&lt;br /&gt;
| message = New gpu-h100 and gpu-l40 partitions are available for general scheduling with new gpu hardware. gpu-v100 also has 6 fewer nodes. You can view more details about the node specs by running the arc.hardware script on the login node. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Navbox ARC}}&lt;br /&gt;
[[Category:ARC]]&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=MARC_Cluster_Status&amp;diff=3795</id>
		<title>MARC Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=MARC_Cluster_Status&amp;diff=3795"/>
		<updated>2025-06-09T17:57:06Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Cluster Status&lt;br /&gt;
| cluster = MARC&lt;br /&gt;
| status = green&lt;br /&gt;
| title = Cluster operational&lt;br /&gt;
| message = See the [[MARC Cluster Status]] page for system notices. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ January System Updates&lt;br /&gt;
| date = 2023/01/01&lt;br /&gt;
| message =&lt;br /&gt;
Beginning January 23, 2023, the MARC cluster will undergo operating system updates. We shall do our utmost to minimize disruption and allow ongoing jobs to be completed. New jobs may be temporarily held from scheduling.&lt;br /&gt;
&lt;br /&gt;
The MARC login node will reboot on the morning of January 23. Please save your work and log out if possible.&lt;br /&gt;
&lt;br /&gt;
The upgrade is planned to be fully complete by January 27.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = System Updates Completed&lt;br /&gt;
| date = 2023/01/24&lt;br /&gt;
| message =&lt;br /&gt;
The upgrade has been completed. The following has been changed:&lt;br /&gt;
* OS Updated to Rocky Linux 8.7&lt;br /&gt;
* Slurm updated to 22.05.7&lt;br /&gt;
* Apptainer replaces Singularity&lt;br /&gt;
* Each job will have its own /tmp, /dev/shm, /run/user/$uid mounted&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Apptainer (Singularity) on MARC Login Node&lt;br /&gt;
| date = 2023/06/23&lt;br /&gt;
| message =&lt;br /&gt;
Apptainer (Singularity) containers may experience an error when&lt;br /&gt;
running on the MARC login node. If apptainer complains that a system&lt;br /&gt;
administrator needs to enable user namespaces, simply run your&lt;br /&gt;
containers inside a job.&lt;br /&gt;
&lt;br /&gt;
This is a temporary measure due to security vulnerability that will be&lt;br /&gt;
patched soon.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Storage Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2023/10/23&lt;br /&gt;
| message =&lt;br /&gt;
We will be performing storage upgrades on the MARC/ARC cluster on &lt;br /&gt;
November 16 and 17, 2023. To facilitate this, we will be throttling &lt;br /&gt;
down the number of jobs on both clusters while the upgrades are &lt;br /&gt;
performed&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = OS Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2024/09/11&lt;br /&gt;
| message =&lt;br /&gt;
MARC will be going down for OS upgrades on 2024/Sep/16. The cluster &lt;br /&gt;
will be unavailable temporarily to complete this work. Please contact&lt;br /&gt;
support@hpc.ucalgary.ca if you have any questions or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance &lt;br /&gt;
| date = 2024/12/11&lt;br /&gt;
| message = The MARC login node will be rebooted on Tuesday December 17 for scheduled maintenance. It will be down for a few minutes and return shortly. Job scheduling and jobs running on the cluster will not be affected. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/07&lt;br /&gt;
| message = The MARC cluster will be rebooted for OS updates on Monday January 20, 2025. Please make sure to save your work and log out before the reboot happens. Scheduling will be paused until the cluster is back, but queued jobs will remain in the queue and nodes will start scheduling when the cluster is ready. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address down&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca Unavailable&lt;br /&gt;
&lt;br /&gt;
Please be informed that our support email address (support@hpc.ucalgary.ca) for RCS is currently not working. We are working to bring it back as soon as possible. Please keep an eye on this space for updates. The clusters are working normally, but support will not receive your messages at this time. We will begin responding as soon as we can get it back. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address functional&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca is back&lt;br /&gt;
&lt;br /&gt;
support@hpc.ucalgary.ca has been repaired and RCS can be contacted there. If you had reached out for assistance in recent days without response please follow up as we may not have received your initial email. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = MARC Scheduled File System Maintenance&lt;br /&gt;
| date = 2025/06/09&lt;br /&gt;
| message = Please be advised MARC will be going down for a period of approximately 2 hours starting at 10 AM June 17, 2025. Logins will not be available and no jobs will be running during this window. &lt;br /&gt;
&lt;br /&gt;
Please send any questions or concerns to support@hpc.ucalgary.ca. Apologies for the inconvenience.&lt;br /&gt;
}}&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=MARC_Cluster_Status&amp;diff=3794</id>
		<title>MARC Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=MARC_Cluster_Status&amp;diff=3794"/>
		<updated>2025-06-09T17:55:40Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Cluster Status&lt;br /&gt;
| cluster = MARC&lt;br /&gt;
| status = green&lt;br /&gt;
| title = Cluster operational&lt;br /&gt;
| message = Upgrades are planned for Jan 20 2025. Please contact us if you experience system issues.&lt;br /&gt;
See the [[MARC Cluster Status]] page for system notices. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ January System Updates&lt;br /&gt;
| date = 2023/01/01&lt;br /&gt;
| message =&lt;br /&gt;
Beginning January 23, 2023, the MARC cluster will undergo operating system updates. We shall do our utmost to minimize disruption and allow ongoing jobs to be completed. New jobs may be temporarily held from scheduling.&lt;br /&gt;
&lt;br /&gt;
The MARC login node will reboot on the morning of January 23. Please save your work and log out if possible.&lt;br /&gt;
&lt;br /&gt;
The upgrade is planned to be fully complete by January 27.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = System Updates Completed&lt;br /&gt;
| date = 2023/01/24&lt;br /&gt;
| message =&lt;br /&gt;
The upgrade has been completed. The following has been changed:&lt;br /&gt;
* OS Updated to Rocky Linux 8.7&lt;br /&gt;
* Slurm updated to 22.05.7&lt;br /&gt;
* Apptainer replaces Singularity&lt;br /&gt;
* Each job will have its own /tmp, /dev/shm, /run/user/$uid mounted&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Apptainer (Singularity) on MARC Login Node&lt;br /&gt;
| date = 2023/06/23&lt;br /&gt;
| message =&lt;br /&gt;
Apptainer (Singularity) containers may experience an error when&lt;br /&gt;
running on the MARC login node. If apptainer complains that a system&lt;br /&gt;
administrator needs to enable user namespaces, simply run your&lt;br /&gt;
containers inside a job.&lt;br /&gt;
&lt;br /&gt;
This is a temporary measure due to security vulnerability that will be&lt;br /&gt;
patched soon.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Storage Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2023/10/23&lt;br /&gt;
| message =&lt;br /&gt;
We will be performing storage upgrades on the MARC/ARC cluster on &lt;br /&gt;
November 16 and 17, 2023. To facilitate this, we will be throttling &lt;br /&gt;
down the number of jobs on both clusters while the upgrades are &lt;br /&gt;
performed&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = OS Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2024/09/11&lt;br /&gt;
| message =&lt;br /&gt;
MARC will be going down for OS upgrades on 2024/Sep/16. The cluster &lt;br /&gt;
will be unavailable temporarily to complete this work. Please contact&lt;br /&gt;
support@hpc.ucalgary.ca if you have any questions or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance &lt;br /&gt;
| date = 2024/12/11&lt;br /&gt;
| message = The MARC login node will be rebooted on Tuesday December 17 for scheduled maintenance. It will be down for a few minutes and return shortly. Job scheduling and jobs running on the cluster will not be affected. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/07&lt;br /&gt;
| message = The MARC cluster will be rebooted for OS updates on Monday January 20, 2025. Please make sure to save your work and log out before the reboot happens. Scheduling will be paused until the cluster is back, but queued jobs will remain in the queue and nodes will start scheduling when the cluster is ready. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address down&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca Unavailable&lt;br /&gt;
&lt;br /&gt;
Please be informed that our support email address (support@hpc.ucalgary.ca) for RCS is currently not working. We are working to bring it back as soon as possible. Please keep an eye on this space for updates. The clusters are working normally, but support will not receive your messages at this time. We will begin responding as soon as we can get it back. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address functional&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca is back&lt;br /&gt;
&lt;br /&gt;
support@hpc.ucalgary.ca has been repaired and RCS can be contacted there. If you had reached out for assistance in recent days without response please follow up as we may not have received your initial email. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = MARC Scheduled File System Maintenance&lt;br /&gt;
| date = 2025/06/09&lt;br /&gt;
| message = Please be advised MARC will be going down for a period of approximately 2 hours starting at 10 AM June 17, 2025. Logins will not be available and no jobs will be running during this window. &lt;br /&gt;
&lt;br /&gt;
Please send any questions or concerns to support@hpc.ucalgary.ca. Apologies for the inconvenience.&lt;br /&gt;
}}&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=Altis_Login_Node_Status&amp;diff=3756</id>
		<title>Altis Login Node Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=Altis_Login_Node_Status&amp;diff=3756"/>
		<updated>2025-03-10T20:50:45Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{ARC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/09/03&lt;br /&gt;
| message =&lt;br /&gt;
The ARC Cluster and the Altis login node is operational. No upcoming upgrades are planned.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Notice of Upcoming Partial Outage&lt;br /&gt;
| date = 2024/08/27&lt;br /&gt;
| message = Several compute nodes from the ARC cluster will be unavailable &lt;br /&gt;
between Sept 23 to Sept 27 inclusive (subject to change). Some Altis GPU nodes will be affected during this maintenance window. These nodes will return to service as soon as the work is complete.  &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update I&lt;br /&gt;
| date = 2024/09/25&lt;br /&gt;
| message = Due to hardware issues that is blocking our original maintenance window, most compute nodes that were taken offline on Monday has been brought back online today. An additional partial outage will occur again starting next Tuesday for the same nodes.&lt;br /&gt;
&lt;br /&gt;
On Tuesday, October 1, 2024, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until Friday October 4, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologise for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update II&lt;br /&gt;
| date = 2024/10/04&lt;br /&gt;
| message = The maintenance window will be extended until at least Monday, October 7, 2024 due to a power distribution issue in our renovated data centre.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Monday, October 7, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update III&lt;br /&gt;
| date = 2024/10/07&lt;br /&gt;
| message = Due to technical issues beyond our control the maintenance window will be extended until at least Tuesday, October 15, 2024.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Tuesday, October 15, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Normal Scheduling has resumed. &lt;br /&gt;
| date = 2024/10/08&lt;br /&gt;
| message = The ARC cluster has been successfully brought online and nodes are running jobs normally. We apologize for the extended downtime. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = wdfgpu[1-12] System Update Reboots &lt;br /&gt;
| date = 2024/12/02&lt;br /&gt;
| message = wdfgpu[1-12] will be updated today for a short reboot to install important system updates and will return shortly. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/07&lt;br /&gt;
| message = The ARC cluster will be rebooted for OS updates on Monday January 20, 2025. Please make sure to save your work and log out before the reboot happens. Scheduling will be paused until the cluster is back, but queued jobs will remain in the queue and nodes will start scheduling when the cluster is ready. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/15&lt;br /&gt;
| message = The ARC cluster will be down for maintenance and upgrades starting 9AM Monday, January 20, 2025 through Wednesday, January 22, 2025. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For the duration of the upgrade window:&lt;br /&gt;
* Scheduling will be paused and new jobs will be queued. Any queued jobs will start scheduling only after the upgrade is complete.&lt;br /&gt;
* Access to files via the login node and arc-dtn will generally be available but intermittent. File transfers on the DTN node, including Globus file transfers, may be interrupted during this window.&lt;br /&gt;
&lt;br /&gt;
Please make sure to save your work prior to this outage window to avoid any loss of work.&lt;br /&gt;
&lt;br /&gt;
During this time the following changes will happen:&lt;br /&gt;
&lt;br /&gt;
1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:&lt;br /&gt;
* cpu2023 (temporary)&lt;br /&gt;
* Parallel&lt;br /&gt;
* Theia/Synergy/cpu2017-bf05&lt;br /&gt;
* Single&lt;br /&gt;
Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.&lt;br /&gt;
&lt;br /&gt;
2. A component of the NetApp filer will be replaced. Access to /bulk will be unavailable on Wednesday, January 22, 2025.&lt;br /&gt;
&lt;br /&gt;
3. The compute node operating system will be updated to Rocky Linux 8.10.&lt;br /&gt;
&lt;br /&gt;
4. The Slurm scheduling system will be upgraded.&lt;br /&gt;
&lt;br /&gt;
5. The Open OnDemand web portal will be upgraded.&lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.calgary.ca with any issues or concerns.&lt;br /&gt;
&lt;br /&gt;
⚠️⚠️⚠️⚠️⚠️&lt;br /&gt;
Update Jan 18, 2025&lt;br /&gt;
&lt;br /&gt;
Around 10AM Altis experienced an electrical power brownout.  Some percentage (how many is unknown at this time) of the nodes lost electrical power during this time causing a loss &lt;br /&gt;
of a number of running jobs.  &lt;br /&gt;
&lt;br /&gt;
Sorry for the inconvenience.  &lt;br /&gt;
&lt;br /&gt;
Since Altis is shutting down for maintenance on Monday Jan 20, replacement jobs will likely not start unless they request a timelimit less than the time until 8AM Monday.  &lt;br /&gt;
⚠️⚠️⚠️⚠️⚠️&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Maintenance Complete&lt;br /&gt;
| date = 2025/01/22&lt;br /&gt;
| message = The ARC/Altis cluster upgrade is complete&lt;br /&gt;
&lt;br /&gt;
During this time the following changes happened:&lt;br /&gt;
&lt;br /&gt;
1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:&lt;br /&gt;
* cpu2023 (temporary)&lt;br /&gt;
* Parallel&lt;br /&gt;
* Theia/Synergy/cpu2017-bf05&lt;br /&gt;
* Single&lt;br /&gt;
Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.&lt;br /&gt;
&lt;br /&gt;
2. A component of the NetApp filer was replaced successfully.&lt;br /&gt;
&lt;br /&gt;
3. The compute node operating was updated to Rocky Linux 8.10.&lt;br /&gt;
&lt;br /&gt;
4. The Slurm scheduling system was upgraded.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.calgary.ca with any issues or concerns.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address down&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca Unavailable&lt;br /&gt;
&lt;br /&gt;
Please be informed that our support email address (support@hpc.ucalgary.ca) for RCS is currently not working. We are working to bring it back as soon as possible. Please keep an eye on this space for updates. The clusters are working normally, but support will not receive your messages at this time. We will begin responding as soon as we can get it back. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address functional&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca is back&lt;br /&gt;
&lt;br /&gt;
support@hpc.ucalgary.ca has been repaired and RCS can be contacted there. If you had reached out for assistance in recent days without response please follow up as we may not have received your initial email. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
[[Category:ARC]]&lt;br /&gt;
{{Navbox ARC}}&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=Think_Login_Node_Status&amp;diff=3755</id>
		<title>Think Login Node Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=Think_Login_Node_Status&amp;diff=3755"/>
		<updated>2025-03-10T20:50:31Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{ARC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/09/03&lt;br /&gt;
| message =&lt;br /&gt;
The ARC Cluster and the Think login node is operational. No upcoming upgrades are planned.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Notice of Upcoming Partial Outage&lt;br /&gt;
| date = 2024/08/23&lt;br /&gt;
| message = Several compute nodes from the ARC cluster will be unavailable &lt;br /&gt;
between Sept 23 to Sept 27 inclusive (subject to change). Some Think GPU nodes will be affected during this maintenance window. These nodes will return to service as soon as the work is complete.  &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update I&lt;br /&gt;
| date = 2024/09/25&lt;br /&gt;
| message = Due to hardware issues that is blocking our original maintenance window, most compute nodes that were taken offline on Monday has been brought back online today. An additional partial outage will occur again starting next Tuesday for the same nodes.&lt;br /&gt;
&lt;br /&gt;
On Tuesday, October 1, 2024, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until Friday October 4, 2024.&lt;br /&gt;
&lt;br /&gt;
We apologise for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update II&lt;br /&gt;
| date = 2024/10/04&lt;br /&gt;
| message = The maintenance window will be extended until at least Monday, October 7, 2024 due to a power distribution issue in our renovated data centre.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Monday, October 7, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update III&lt;br /&gt;
| date = 2024/10/07&lt;br /&gt;
| message = Due to technical issues beyond our control the maintenance window will be extended until at least Tuesday, October 15, 2024.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Tuesday, October 15, 2024. Affected WDF GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Normal Scheduling has resumed. &lt;br /&gt;
| date = 2024/10/08&lt;br /&gt;
| message = The ARC cluster has been successfully brought online and nodes are running jobs normally. We apologize for the extended downtime. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = wdfgpu[1-12] System Update Reboots &lt;br /&gt;
| date = 2024/12/02&lt;br /&gt;
| message = wdfgpu[1-12] will be updated today for a short reboot to install important system updates and will return shortly. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/07&lt;br /&gt;
| message = The ARC cluster will be rebooted for OS updates on Monday January 20, 2025. Please make sure to save your work and log out before the reboot happens. Scheduling will be paused until the cluster is back, but queued jobs will remain in the queue and nodes will start scheduling when the cluster is ready. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/15&lt;br /&gt;
| message = The ARC cluster will be down for maintenance and upgrades starting 9AM Monday, January 20, 2025 through Wednesday, January 22, 2025. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For the duration of the upgrade window:&lt;br /&gt;
* Scheduling will be paused and new jobs will be queued. Any queued jobs will start scheduling only after the upgrade is complete.&lt;br /&gt;
* Access to files via the login node and arc-dtn will generally be available but intermittent. File transfers on the DTN node, including Globus file transfers, may be interrupted during this window.&lt;br /&gt;
&lt;br /&gt;
Please make sure to save your work prior to this outage window to avoid any loss of work.&lt;br /&gt;
&lt;br /&gt;
During this time the following changes will happen:&lt;br /&gt;
&lt;br /&gt;
1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:&lt;br /&gt;
* cpu2023 (temporary)&lt;br /&gt;
* Parallel&lt;br /&gt;
* Theia/Synergy/cpu2017-bf05&lt;br /&gt;
* Single&lt;br /&gt;
Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.&lt;br /&gt;
&lt;br /&gt;
2. A component of the NetApp filer will be replaced. Access to /bulk will be unavailable on Wednesday, January 22, 2025.&lt;br /&gt;
&lt;br /&gt;
3. The compute node operating system will be updated to Rocky Linux 8.10.&lt;br /&gt;
&lt;br /&gt;
4. The Slurm scheduling system will be upgraded.&lt;br /&gt;
&lt;br /&gt;
5. The Open OnDemand web portal will be upgraded.&lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.calgary.ca with any issues or concerns.&lt;br /&gt;
&lt;br /&gt;
⚠️⚠️⚠️⚠️⚠️&lt;br /&gt;
Update Jan 18, 2025&lt;br /&gt;
&lt;br /&gt;
Around 10AM Arc experienced an electrical power brownout.  Some percentage (how many is unknown at this time) of the nodes lost electrical power during this time causing a loss &lt;br /&gt;
of a number of running jobs.  &lt;br /&gt;
&lt;br /&gt;
Sorry for the inconvenience.  &lt;br /&gt;
&lt;br /&gt;
Since Arc is shutting down for maintenance on Monday Jan 20, replacement jobs will likely not start unless they request a timelimit less than the time until 8AM Monday.  &lt;br /&gt;
⚠️⚠️⚠️⚠️⚠️&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Maintenance Complete&lt;br /&gt;
| date = 2025/01/22&lt;br /&gt;
| message = The ARC/Think cluster upgrade is complete&lt;br /&gt;
&lt;br /&gt;
During this time the following changes happened:&lt;br /&gt;
&lt;br /&gt;
1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:&lt;br /&gt;
* cpu2023 (temporary)&lt;br /&gt;
* Parallel&lt;br /&gt;
* Theia/Synergy/cpu2017-bf05&lt;br /&gt;
* Single&lt;br /&gt;
Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.&lt;br /&gt;
&lt;br /&gt;
2. A component of the NetApp filer was replaced successfully.&lt;br /&gt;
&lt;br /&gt;
3. The compute node operating was updated to Rocky Linux 8.10.&lt;br /&gt;
&lt;br /&gt;
4. The Slurm scheduling system was upgraded.&lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.calgary.ca with any issues or concerns.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address down&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca Unavailable&lt;br /&gt;
&lt;br /&gt;
Please be informed that our support email address (support@hpc.ucalgary.ca) for RCS is currently not working. We are working to bring it back as soon as possible. Please keep an eye on this space for updates. The clusters are working normally, but support will not receive your messages at this time. We will begin responding as soon as we can get it back. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address functional&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca is back&lt;br /&gt;
&lt;br /&gt;
support@hpc.ucalgary.ca has been repaired and RCS can be contacted there. If you had reached out for assistance in recent days without response please follow up as we may not have received your initial email. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
[[Category:ARC]]&lt;br /&gt;
{{Navbox ARC}}&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=TALC_Cluster_Status&amp;diff=3754</id>
		<title>TALC Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=TALC_Cluster_Status&amp;diff=3754"/>
		<updated>2025-03-10T20:49:57Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{TALC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ May System Updates&lt;br /&gt;
| date = 2023/02/02&lt;br /&gt;
| message =&lt;br /&gt;
Beginning May 1, 2023, the TALC cluster will undergo operating system updates. The upgrade will happen after the end of term to minimize any disruption. Any existing jobs may be &lt;br /&gt;
temporarily held from scheduling. The upgrade is planned to be fully complete by May 5.&lt;br /&gt;
&lt;br /&gt;
The TALC login node will reboot on the morning of May 1.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = May System Updates Completed&lt;br /&gt;
| date = 2023/05/04&lt;br /&gt;
| message =&lt;br /&gt;
TALC upgrades have been completed. If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Power Interruption&lt;br /&gt;
| date = 2024/05/07&lt;br /&gt;
| message = TALC Experienced an brief power outage around 11AM May 7, 2024.&lt;br /&gt;
Most compute nodes have or are rebooting.  Most jobs running at this time &lt;br /&gt;
were lost. Administrators are actively working on restarting compute &lt;br /&gt;
nodes. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/06/26&lt;br /&gt;
| message = &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address down&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca Unavailable&lt;br /&gt;
&lt;br /&gt;
Please be informed that our support email address (support@hpc.ucalgary.ca) for RCS is currently not working. We are working to bring it back as soon as possible. Please keep an eye on this space for updates. The clusters are working normally, but support will not receive your messages at this time. We will begin responding as soon as we can get it back. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address functional&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca is back&lt;br /&gt;
&lt;br /&gt;
support@hpc.ucalgary.ca has been repaired and RCS can be contacted there. If you had reached out for assistance in recent days without response please follow up as we may not have received your initial email. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
[[Category:TALC]]&lt;br /&gt;
{{Navbox TALC}}&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=GLaDOS_Cluster_Status&amp;diff=3753</id>
		<title>GLaDOS Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=GLaDOS_Cluster_Status&amp;diff=3753"/>
		<updated>2025-03-10T20:49:35Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{GLaDOS Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ January System Updates&lt;br /&gt;
| date = 2023/01/01&lt;br /&gt;
| message =&lt;br /&gt;
Beginning January 30, 2023, the GLaDOS cluster will undergo operating system updates. We shall do our utmost to minimize disruption and allow ongoing jobs to be completed. New jobs may be temporarily held from scheduling.&lt;br /&gt;
&lt;br /&gt;
The GLaDOS login node will reboot on the morning of January 30. Please save your work and log out if possible.&lt;br /&gt;
&lt;br /&gt;
The upgrade is planned to be fully complete by February 3.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = System Updates Completed&lt;br /&gt;
| date = 2023/01/31&lt;br /&gt;
| message =&lt;br /&gt;
The upgrade has been completed. The following has been changed:&lt;br /&gt;
* OS Updated to Rocky Linux 8.7&lt;br /&gt;
* Slurm updated to 22.05.7&lt;br /&gt;
* Apptainer replaces Singularity&lt;br /&gt;
* Each job will have its own /tmp, /dev/shm, /run/user/$uid mounted&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Apptainer (Singularity) on GLaDOS Login Node&lt;br /&gt;
| date = 2023/06/23&lt;br /&gt;
| message =&lt;br /&gt;
Apptainer (Singularity) containers may experience an error when&lt;br /&gt;
running on the GLaDOS login node. If apptainer complains that a system&lt;br /&gt;
administrator needs to enable user namespaces, simply run your&lt;br /&gt;
containers inside a job.&lt;br /&gt;
&lt;br /&gt;
This is a temporary measure due to security vulnerability that will be&lt;br /&gt;
patched soon.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GLaDOS Scheduled Temporary Shutdown for Move&lt;br /&gt;
| date = 2023/06/26&lt;br /&gt;
| message =&lt;br /&gt;
GLaDOS is scheduled to be shut down temporarily to allow for the &lt;br /&gt;
cluster to be physically moved beginning Tuesday September 5, 2023&lt;br /&gt;
The cluster is expected to be down the rest of the week and back &lt;br /&gt;
online on or before Monday the 11th.&lt;br /&gt;
&lt;br /&gt;
Please send any questions or concerns to support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GLaDOS Move Complete&lt;br /&gt;
| date = 2023/09/11&lt;br /&gt;
| message =&lt;br /&gt;
GLaDOS has been moved and jobs can be submitted for scheduling.&lt;br /&gt;
&lt;br /&gt;
Please send any questions or concerns to support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Power Interruption&lt;br /&gt;
| date = 2024/05/07&lt;br /&gt;
| message = Glados Experienced an brief power outage around 11AM May 7, 2024.&lt;br /&gt;
Most compute nodes have or are rebooting.  Most jobs running at this time &lt;br /&gt;
were lost. Administrators are actively working on restarting compute &lt;br /&gt;
nodes. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = OS updates complete&lt;br /&gt;
| date = 2024/05/07&lt;br /&gt;
| message = Glados has been updated to Rocky Linux 8.10 and is operating normally&lt;br /&gt;
&lt;br /&gt;
Please reach out with any questions or concerns to support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address down&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca Unavailable&lt;br /&gt;
&lt;br /&gt;
Please be informed that our support email address (support@hpc.ucalgary.ca) for RCS is currently not working. We are working to bring it back as soon as possible. Please keep an eye on this space for updates. The clusters are working normally, but support will not receive your messages at this time. We will begin responding as soon as we can get it back. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address functional&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca is back&lt;br /&gt;
&lt;br /&gt;
support@hpc.ucalgary.ca has been repaired and RCS can be contacted there. If you had reached out for assistance in recent days without response please follow up as we may not have received your initial email. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
[[Category:GLaDOS]]&lt;br /&gt;
{{Navbox GLaDOS}}&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=MARC_Cluster_Status&amp;diff=3752</id>
		<title>MARC Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=MARC_Cluster_Status&amp;diff=3752"/>
		<updated>2025-03-10T20:49:19Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Cluster Status&lt;br /&gt;
| cluster = MARC&lt;br /&gt;
| status = green&lt;br /&gt;
| title = Cluster operational&lt;br /&gt;
| message = Upgrades are planned for Jan 20 2025. Please contact us if you experience system issues.&lt;br /&gt;
See the [[MARC Cluster Status]] page for system notices. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ January System Updates&lt;br /&gt;
| date = 2023/01/01&lt;br /&gt;
| message =&lt;br /&gt;
Beginning January 23, 2023, the MARC cluster will undergo operating system updates. We shall do our utmost to minimize disruption and allow ongoing jobs to be completed. New jobs may be temporarily held from scheduling.&lt;br /&gt;
&lt;br /&gt;
The MARC login node will reboot on the morning of January 23. Please save your work and log out if possible.&lt;br /&gt;
&lt;br /&gt;
The upgrade is planned to be fully complete by January 27.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = System Updates Completed&lt;br /&gt;
| date = 2023/01/24&lt;br /&gt;
| message =&lt;br /&gt;
The upgrade has been completed. The following has been changed:&lt;br /&gt;
* OS Updated to Rocky Linux 8.7&lt;br /&gt;
* Slurm updated to 22.05.7&lt;br /&gt;
* Apptainer replaces Singularity&lt;br /&gt;
* Each job will have its own /tmp, /dev/shm, /run/user/$uid mounted&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Apptainer (Singularity) on MARC Login Node&lt;br /&gt;
| date = 2023/06/23&lt;br /&gt;
| message =&lt;br /&gt;
Apptainer (Singularity) containers may experience an error when&lt;br /&gt;
running on the MARC login node. If apptainer complains that a system&lt;br /&gt;
administrator needs to enable user namespaces, simply run your&lt;br /&gt;
containers inside a job.&lt;br /&gt;
&lt;br /&gt;
This is a temporary measure due to security vulnerability that will be&lt;br /&gt;
patched soon.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Storage Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2023/10/23&lt;br /&gt;
| message =&lt;br /&gt;
We will be performing storage upgrades on the MARC/ARC cluster on &lt;br /&gt;
November 16 and 17, 2023. To facilitate this, we will be throttling &lt;br /&gt;
down the number of jobs on both clusters while the upgrades are &lt;br /&gt;
performed&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = OS Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2024/09/11&lt;br /&gt;
| message =&lt;br /&gt;
MARC will be going down for OS upgrades on 2024/Sep/16. The cluster &lt;br /&gt;
will be unavailable temporarily to complete this work. Please contact&lt;br /&gt;
support@hpc.ucalgary.ca if you have any questions or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance &lt;br /&gt;
| date = 2024/12/11&lt;br /&gt;
| message = The MARC login node will be rebooted on Tuesday December 17 for scheduled maintenance. It will be down for a few minutes and return shortly. Job scheduling and jobs running on the cluster will not be affected. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/07&lt;br /&gt;
| message = The MARC cluster will be rebooted for OS updates on Monday January 20, 2025. Please make sure to save your work and log out before the reboot happens. Scheduling will be paused until the cluster is back, but queued jobs will remain in the queue and nodes will start scheduling when the cluster is ready. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address down&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca Unavailable&lt;br /&gt;
&lt;br /&gt;
Please be informed that our support email address (support@hpc.ucalgary.ca) for RCS is currently not working. We are working to bring it back as soon as possible. Please keep an eye on this space for updates. The clusters are working normally, but support will not receive your messages at this time. We will begin responding as soon as we can get it back. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address functional&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca is back&lt;br /&gt;
&lt;br /&gt;
support@hpc.ucalgary.ca has been repaired and RCS can be contacted there. If you had reached out for assistance in recent days without response please follow up as we may not have received your initial email. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
}}&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=3751</id>
		<title>ARC Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=3751"/>
		<updated>2025-03-10T20:48:55Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Cluster Status&lt;br /&gt;
| cluster = ARC&lt;br /&gt;
| status = green&lt;br /&gt;
| title = Cluster operational - Power Bump Jan 18&lt;br /&gt;
| message = System is operational. Updates are planned for Jan 20. Please see MOTD&lt;br /&gt;
&lt;br /&gt;
See the [[ARC Cluster Status]] page for system notices. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = January System Updates&lt;br /&gt;
| date = 2023/01/01&lt;br /&gt;
| message =&lt;br /&gt;
Beginning January 16, 2023, the ARC cluster will undergo operating system updates. We shall do our utmost to minimize disruption and allow ongoing jobs to be completed. New jobs may be temporarily held from scheduling.&lt;br /&gt;
&lt;br /&gt;
The ARC login node will reboot on the morning of January 16. Please save your work and log out if possible.&lt;br /&gt;
&lt;br /&gt;
The upgrade is planned to be fully complete by January 20.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = System Updates Completed&lt;br /&gt;
| date = 2023/01/24&lt;br /&gt;
| message =&lt;br /&gt;
The upgrade has been completed. The following has been changed:&lt;br /&gt;
* OS Updated to Rocky Linux 8.7&lt;br /&gt;
* Slurm updated to 22.05.7&lt;br /&gt;
* Apptainer replaces Singularity&lt;br /&gt;
* Each job will have its own /tmp, /dev/shm, /run/user/$uid mounted&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/02/28&lt;br /&gt;
| message =&lt;br /&gt;
We are currently investigating a filesystem issue that is causing filesystem slowdowns across ARC.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/03/1&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns across ARC. Some jobs on ARC have been paused to help us find the root cause of the slowdowns.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ARC Login node reboot&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
The ARC login node will be rebooted this afternoon for an emergency maintenance. This downtime is needed to help mitigate the filesystem slowdowns experienced on the login node.  Jobs will continue running and scheduling during this time.&lt;br /&gt;
&lt;br /&gt;
All logins to the ARC login node will be terminated at 3:00PM and will remain unavailable until 4:00PM.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Filesystem Issues&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns on specific nodes in our MSRDC location.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues Resolved&lt;br /&gt;
| date = 2023/03/10&lt;br /&gt;
| message =&lt;br /&gt;
We have upgraded the filesystem routers in our MSRDC location to address the performance issues.&lt;br /&gt;
&lt;br /&gt;
Please let us know if you experience any issues with the filesystem performance.&lt;br /&gt;
&lt;br /&gt;
Thank-you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/05/01&lt;br /&gt;
| message =&lt;br /&gt;
On May 1, 2023, the ARC Open OnDemand node will be rebooted between 5PM and 6PM. Expected downtime will be approximately 15 minutes.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Apptainer (Singularity) on ARC Login Node&lt;br /&gt;
| date = 2023/06/22&lt;br /&gt;
| message =&lt;br /&gt;
Apptainer (Singularity) containers may experience an error when&lt;br /&gt;
running on the Arc login node. If apptainer complains that a system&lt;br /&gt;
administrator needs to enable user namespaces, simply run your&lt;br /&gt;
containers inside a job.&lt;br /&gt;
&lt;br /&gt;
This is a temporary measure due to security vulnerability that will be&lt;br /&gt;
patched soon.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Lattice, Single, cpu2013 partition changes&lt;br /&gt;
| date = 2023/07/13&lt;br /&gt;
| message =&lt;br /&gt;
The Lattice and Single, and cpu2013 have all been decomissioned.  The Single&lt;br /&gt;
partition will be replaced by the nodes formerly in the cpu2013 partition but&lt;br /&gt;
will be called single.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/10/17&lt;br /&gt;
| message =&lt;br /&gt;
Open OnDemand will be rebooted on October 17, 2023 for an update. It will be down for up to 30 minutes.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Storage Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2023/10/23&lt;br /&gt;
| message =&lt;br /&gt;
We will be performing storage upgrades on the MARC/ARC cluster on &lt;br /&gt;
November 16 and 17, 2023. To facilitate this, we will be throttling &lt;br /&gt;
down the number of jobs on both clusters while the upgrades are &lt;br /&gt;
performed&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/05/3&lt;br /&gt;
| message =&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Power Interruption&lt;br /&gt;
| date = 2024/05/07&lt;br /&gt;
| message = Arc Experienced an brief power outage around 11AM May 7, 2024.&lt;br /&gt;
Most compute nodes have or are rebooting.  Most jobs running at this time &lt;br /&gt;
were lost. Arc administrators are actively working on restarting compute &lt;br /&gt;
nodes. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation&lt;br /&gt;
| date = 2024/06/03&lt;br /&gt;
| message = Job submissions targeted to the  GPU a100 partition will be &lt;br /&gt;
affected by a temporary reservation on the nodes to accommodate the RCS&lt;br /&gt;
summer school class taking place on 2024/Jun/10. Reservation will end &lt;br /&gt;
shortly after. Please submit your jobs normally and the scheduler will &lt;br /&gt;
start them as soon as the nodes are available. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation Removed&lt;br /&gt;
| date = 2024/06/11&lt;br /&gt;
| message = GPU a100 Nodes in ARC have been returned to normal scheduling. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Notice of Upcoming Partial Outage&lt;br /&gt;
| date = 2024/08/23&lt;br /&gt;
| message = Several compute nodes from the ARC cluster will be unavailable &lt;br /&gt;
between Sept 23 to Sept 27 inclusive (subject to change). All compute nodes &lt;br /&gt;
in cpu2019, cpu2021/2, gpu-v100 most nodes from bigmem and gpu-a100 will be &lt;br /&gt;
affected. These nodes will return to service as soon as the work is complete.  &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update I&lt;br /&gt;
| date = 2024/09/25&lt;br /&gt;
| message = Due to hardware issues that is blocking our original maintenance window, most compute nodes that were taken offline on Monday has been brought back online today. An additional partial outage will occur again starting next Tuesday for the same nodes.&lt;br /&gt;
&lt;br /&gt;
On Tuesday, October 1, 2024, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until Friday October 4, 2024.&lt;br /&gt;
&lt;br /&gt;
We apologise for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update II&lt;br /&gt;
| date = 2024/10/04&lt;br /&gt;
| message = The maintenance window will be extended until at least Monday, October 7, 2024 due to a power distribution issue in our renovated data centre.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Monday, October 7, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update III&lt;br /&gt;
| date = 2024/10/07&lt;br /&gt;
| message = Due to technical issues beyond our control the maintenance window will be extended until at least Tuesday, October 15, 2024.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Tuesday, October 15, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Normal Scheduling has resumed. &lt;br /&gt;
| date = 2024/10/08&lt;br /&gt;
| message = The ARC cluster has been successfully brought online and nodes are running jobs normally. We apologize for the extended downtime. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance &lt;br /&gt;
| date = 2024/12/11&lt;br /&gt;
| message = The ARC login node will be rebooted on Tuesday December 17 for scheduled maintenance. It will be down for a few minutes and return shortly. Job scheduling and jobs running on the cluster will not be affected. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/07&lt;br /&gt;
| message = The ARC cluster will be rebooted for OS updates on Monday January 20, 2025. Please make sure to save your work and log out before the reboot happens. Scheduling will be paused until the cluster is back, but queued jobs will remain in the queue and nodes will start scheduling when the cluster is ready. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/15&lt;br /&gt;
| message = The ARC cluster will be down for maintenance and upgrades starting 9AM Monday, January 20, 2025 through Wednesday, January 22, 2025. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For the duration of the upgrade window:&lt;br /&gt;
* Scheduling will be paused and new jobs will be queued. Any queued jobs will start scheduling only after the upgrade is complete.&lt;br /&gt;
* Access to files via the login node and arc-dtn will generally be available but intermittent. File transfers on the DTN node, including Globus file transfers, may be interrupted during this window.&lt;br /&gt;
&lt;br /&gt;
Please make sure to save your work prior to this outage window to avoid any loss of work.&lt;br /&gt;
&lt;br /&gt;
During this time the following changes will happen:&lt;br /&gt;
&lt;br /&gt;
1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:&lt;br /&gt;
* cpu2023 (temporary)&lt;br /&gt;
* Parallel&lt;br /&gt;
* Theia/Synergy/cpu2017-bf05&lt;br /&gt;
* Single&lt;br /&gt;
Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.&lt;br /&gt;
&lt;br /&gt;
2. A component of the NetApp filer will be replaced. Access to /bulk will be unavailable on Wednesday, January 22, 2025.&lt;br /&gt;
&lt;br /&gt;
3. The compute node operating system will be updated to Rocky Linux 8.10.&lt;br /&gt;
&lt;br /&gt;
4. The Slurm scheduling system will be upgraded.&lt;br /&gt;
&lt;br /&gt;
5. The Open OnDemand web portal will be upgraded.&lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.calgary.ca with any issues or concerns.&lt;br /&gt;
&lt;br /&gt;
⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️&lt;br /&gt;
Update Jan 18, 2025&lt;br /&gt;
&lt;br /&gt;
Around 10AM Arc experienced an electrical power brownout.  Some percentage (how many is unknown at this time) of the nodes lost electrical power during this time causing a loss of a number of running jobs.  &lt;br /&gt;
&lt;br /&gt;
Sorry for the inconvenience.  &lt;br /&gt;
&lt;br /&gt;
Since Arc is shutting down for maintenance on Monday Jan 20, replacement jobs will likely not start unless they request a timelimit less than the time until 8AM Monday.  &lt;br /&gt;
⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Maintenance Complete&lt;br /&gt;
| date = 2025/01/22&lt;br /&gt;
| message = The ARC cluster upgrade is complete&lt;br /&gt;
&lt;br /&gt;
During this time the following changes happened:&lt;br /&gt;
&lt;br /&gt;
1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:&lt;br /&gt;
* cpu2023 (temporary)&lt;br /&gt;
* Parallel&lt;br /&gt;
* Theia/Synergy/cpu2017-bf05&lt;br /&gt;
* Single&lt;br /&gt;
Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.&lt;br /&gt;
&lt;br /&gt;
2. A component of the NetApp filer was replaced successfully.&lt;br /&gt;
&lt;br /&gt;
3. The compute node operating was updated to Rocky Linux 8.10.&lt;br /&gt;
&lt;br /&gt;
4. The Slurm scheduling system was upgraded.&lt;br /&gt;
&lt;br /&gt;
5. The Open OnDemand web portal was upgraded.&lt;br /&gt;
&lt;br /&gt;
6. The Parallel partition was renamed to Legacy to show the lack of an interconnect for parallel MPI work and was restricted to maximum 4 node jobs.&lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.calgary.ca with any issues or concerns.&lt;br /&gt;
&lt;br /&gt;
Jan 23, 9:08AM&lt;br /&gt;
&lt;br /&gt;
Remount complete.  arc is back in full service.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Module Command Upgrade&lt;br /&gt;
| date = 2025/02/03&lt;br /&gt;
| message = Upgrade of the module command&lt;br /&gt;
&lt;br /&gt;
On Tuesday, February 11, 2025 the module command will be upgraded to a new verson on Arc. This should result in new capabilities and a slightly different visual experience when using the module command.  Loading modules is not expected to change.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Module Command Upgraded&lt;br /&gt;
| date = 2025/02/12&lt;br /&gt;
| message = The module command was upgraded&lt;br /&gt;
&lt;br /&gt;
On Tuesday, February 12, 2025 the module command was upgraded to a new verson on Arc. This should result in new capabilities and a slightly different visual experience when using the module command.  Loading modules should not have changed.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address down&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca Unavailable&lt;br /&gt;
&lt;br /&gt;
Please be informed that our support email address (support@hpc.ucalgary.ca) for RCS is currently not working. We are working to bring it back as soon as possible. Please keep an eye on this space for updates. The clusters are working normally, but support will not receive your messages at this time. We will begin responding as soon as we can get it back. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address functional&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca is back&lt;br /&gt;
&lt;br /&gt;
support@hpc.ucalgary.ca has been repaired and RCS can be contacted there. If you had reached out for assistance in recent days without response please follow up as we may not have received your initial email. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Navbox ARC}}&lt;br /&gt;
[[Category:ARC]]&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=Altis_Login_Node_Status&amp;diff=3747</id>
		<title>Altis Login Node Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=Altis_Login_Node_Status&amp;diff=3747"/>
		<updated>2025-03-07T20:09:25Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{ARC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/09/03&lt;br /&gt;
| message =&lt;br /&gt;
The ARC Cluster and the Altis login node is operational. No upcoming upgrades are planned.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Notice of Upcoming Partial Outage&lt;br /&gt;
| date = 2024/08/27&lt;br /&gt;
| message = Several compute nodes from the ARC cluster will be unavailable &lt;br /&gt;
between Sept 23 to Sept 27 inclusive (subject to change). Some Altis GPU nodes will be affected during this maintenance window. These nodes will return to service as soon as the work is complete.  &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update I&lt;br /&gt;
| date = 2024/09/25&lt;br /&gt;
| message = Due to hardware issues that is blocking our original maintenance window, most compute nodes that were taken offline on Monday has been brought back online today. An additional partial outage will occur again starting next Tuesday for the same nodes.&lt;br /&gt;
&lt;br /&gt;
On Tuesday, October 1, 2024, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until Friday October 4, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologise for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update II&lt;br /&gt;
| date = 2024/10/04&lt;br /&gt;
| message = The maintenance window will be extended until at least Monday, October 7, 2024 due to a power distribution issue in our renovated data centre.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Monday, October 7, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update III&lt;br /&gt;
| date = 2024/10/07&lt;br /&gt;
| message = Due to technical issues beyond our control the maintenance window will be extended until at least Tuesday, October 15, 2024.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Tuesday, October 15, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Normal Scheduling has resumed. &lt;br /&gt;
| date = 2024/10/08&lt;br /&gt;
| message = The ARC cluster has been successfully brought online and nodes are running jobs normally. We apologize for the extended downtime. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = wdfgpu[1-12] System Update Reboots &lt;br /&gt;
| date = 2024/12/02&lt;br /&gt;
| message = wdfgpu[1-12] will be updated today for a short reboot to install important system updates and will return shortly. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/07&lt;br /&gt;
| message = The ARC cluster will be rebooted for OS updates on Monday January 20, 2025. Please make sure to save your work and log out before the reboot happens. Scheduling will be paused until the cluster is back, but queued jobs will remain in the queue and nodes will start scheduling when the cluster is ready. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/15&lt;br /&gt;
| message = The ARC cluster will be down for maintenance and upgrades starting 9AM Monday, January 20, 2025 through Wednesday, January 22, 2025. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For the duration of the upgrade window:&lt;br /&gt;
* Scheduling will be paused and new jobs will be queued. Any queued jobs will start scheduling only after the upgrade is complete.&lt;br /&gt;
* Access to files via the login node and arc-dtn will generally be available but intermittent. File transfers on the DTN node, including Globus file transfers, may be interrupted during this window.&lt;br /&gt;
&lt;br /&gt;
Please make sure to save your work prior to this outage window to avoid any loss of work.&lt;br /&gt;
&lt;br /&gt;
During this time the following changes will happen:&lt;br /&gt;
&lt;br /&gt;
1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:&lt;br /&gt;
* cpu2023 (temporary)&lt;br /&gt;
* Parallel&lt;br /&gt;
* Theia/Synergy/cpu2017-bf05&lt;br /&gt;
* Single&lt;br /&gt;
Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.&lt;br /&gt;
&lt;br /&gt;
2. A component of the NetApp filer will be replaced. Access to /bulk will be unavailable on Wednesday, January 22, 2025.&lt;br /&gt;
&lt;br /&gt;
3. The compute node operating system will be updated to Rocky Linux 8.10.&lt;br /&gt;
&lt;br /&gt;
4. The Slurm scheduling system will be upgraded.&lt;br /&gt;
&lt;br /&gt;
5. The Open OnDemand web portal will be upgraded.&lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.calgary.ca with any issues or concerns.&lt;br /&gt;
&lt;br /&gt;
⚠️⚠️⚠️⚠️⚠️&lt;br /&gt;
Update Jan 18, 2025&lt;br /&gt;
&lt;br /&gt;
Around 10AM Altis experienced an electrical power brownout.  Some percentage (how many is unknown at this time) of the nodes lost electrical power during this time causing a loss &lt;br /&gt;
of a number of running jobs.  &lt;br /&gt;
&lt;br /&gt;
Sorry for the inconvenience.  &lt;br /&gt;
&lt;br /&gt;
Since Altis is shutting down for maintenance on Monday Jan 20, replacement jobs will likely not start unless they request a timelimit less than the time until 8AM Monday.  &lt;br /&gt;
⚠️⚠️⚠️⚠️⚠️&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Maintenance Complete&lt;br /&gt;
| date = 2025/01/22&lt;br /&gt;
| message = The ARC/Altis cluster upgrade is complete&lt;br /&gt;
&lt;br /&gt;
During this time the following changes happened:&lt;br /&gt;
&lt;br /&gt;
1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:&lt;br /&gt;
* cpu2023 (temporary)&lt;br /&gt;
* Parallel&lt;br /&gt;
* Theia/Synergy/cpu2017-bf05&lt;br /&gt;
* Single&lt;br /&gt;
Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.&lt;br /&gt;
&lt;br /&gt;
2. A component of the NetApp filer was replaced successfully.&lt;br /&gt;
&lt;br /&gt;
3. The compute node operating was updated to Rocky Linux 8.10.&lt;br /&gt;
&lt;br /&gt;
4. The Slurm scheduling system was upgraded.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.calgary.ca with any issues or concerns.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address down&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca Unavailable&lt;br /&gt;
&lt;br /&gt;
Please be informed that our support email address (support@hpc.ucalgary.ca) for RCS is currently not working. We are working to bring it back as soon as possible. Please keep an eye on this space for updates. The clusters are working normally, but support will not receive your messages at this time. We will begin responding as soon as we can get it back. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
[[Category:ARC]]&lt;br /&gt;
{{Navbox ARC}}&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=Think_Login_Node_Status&amp;diff=3746</id>
		<title>Think Login Node Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=Think_Login_Node_Status&amp;diff=3746"/>
		<updated>2025-03-07T20:09:12Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{ARC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/09/03&lt;br /&gt;
| message =&lt;br /&gt;
The ARC Cluster and the Think login node is operational. No upcoming upgrades are planned.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Notice of Upcoming Partial Outage&lt;br /&gt;
| date = 2024/08/23&lt;br /&gt;
| message = Several compute nodes from the ARC cluster will be unavailable &lt;br /&gt;
between Sept 23 to Sept 27 inclusive (subject to change). Some Think GPU nodes will be affected during this maintenance window. These nodes will return to service as soon as the work is complete.  &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update I&lt;br /&gt;
| date = 2024/09/25&lt;br /&gt;
| message = Due to hardware issues that is blocking our original maintenance window, most compute nodes that were taken offline on Monday has been brought back online today. An additional partial outage will occur again starting next Tuesday for the same nodes.&lt;br /&gt;
&lt;br /&gt;
On Tuesday, October 1, 2024, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until Friday October 4, 2024.&lt;br /&gt;
&lt;br /&gt;
We apologise for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update II&lt;br /&gt;
| date = 2024/10/04&lt;br /&gt;
| message = The maintenance window will be extended until at least Monday, October 7, 2024 due to a power distribution issue in our renovated data centre.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Monday, October 7, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update III&lt;br /&gt;
| date = 2024/10/07&lt;br /&gt;
| message = Due to technical issues beyond our control the maintenance window will be extended until at least Tuesday, October 15, 2024.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Tuesday, October 15, 2024. Affected WDF GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Normal Scheduling has resumed. &lt;br /&gt;
| date = 2024/10/08&lt;br /&gt;
| message = The ARC cluster has been successfully brought online and nodes are running jobs normally. We apologize for the extended downtime. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = wdfgpu[1-12] System Update Reboots &lt;br /&gt;
| date = 2024/12/02&lt;br /&gt;
| message = wdfgpu[1-12] will be updated today for a short reboot to install important system updates and will return shortly. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/07&lt;br /&gt;
| message = The ARC cluster will be rebooted for OS updates on Monday January 20, 2025. Please make sure to save your work and log out before the reboot happens. Scheduling will be paused until the cluster is back, but queued jobs will remain in the queue and nodes will start scheduling when the cluster is ready. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/15&lt;br /&gt;
| message = The ARC cluster will be down for maintenance and upgrades starting 9AM Monday, January 20, 2025 through Wednesday, January 22, 2025. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For the duration of the upgrade window:&lt;br /&gt;
* Scheduling will be paused and new jobs will be queued. Any queued jobs will start scheduling only after the upgrade is complete.&lt;br /&gt;
* Access to files via the login node and arc-dtn will generally be available but intermittent. File transfers on the DTN node, including Globus file transfers, may be interrupted during this window.&lt;br /&gt;
&lt;br /&gt;
Please make sure to save your work prior to this outage window to avoid any loss of work.&lt;br /&gt;
&lt;br /&gt;
During this time the following changes will happen:&lt;br /&gt;
&lt;br /&gt;
1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:&lt;br /&gt;
* cpu2023 (temporary)&lt;br /&gt;
* Parallel&lt;br /&gt;
* Theia/Synergy/cpu2017-bf05&lt;br /&gt;
* Single&lt;br /&gt;
Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.&lt;br /&gt;
&lt;br /&gt;
2. A component of the NetApp filer will be replaced. Access to /bulk will be unavailable on Wednesday, January 22, 2025.&lt;br /&gt;
&lt;br /&gt;
3. The compute node operating system will be updated to Rocky Linux 8.10.&lt;br /&gt;
&lt;br /&gt;
4. The Slurm scheduling system will be upgraded.&lt;br /&gt;
&lt;br /&gt;
5. The Open OnDemand web portal will be upgraded.&lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.calgary.ca with any issues or concerns.&lt;br /&gt;
&lt;br /&gt;
⚠️⚠️⚠️⚠️⚠️&lt;br /&gt;
Update Jan 18, 2025&lt;br /&gt;
&lt;br /&gt;
Around 10AM Arc experienced an electrical power brownout.  Some percentage (how many is unknown at this time) of the nodes lost electrical power during this time causing a loss &lt;br /&gt;
of a number of running jobs.  &lt;br /&gt;
&lt;br /&gt;
Sorry for the inconvenience.  &lt;br /&gt;
&lt;br /&gt;
Since Arc is shutting down for maintenance on Monday Jan 20, replacement jobs will likely not start unless they request a timelimit less than the time until 8AM Monday.  &lt;br /&gt;
⚠️⚠️⚠️⚠️⚠️&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Maintenance Complete&lt;br /&gt;
| date = 2025/01/22&lt;br /&gt;
| message = The ARC/Think cluster upgrade is complete&lt;br /&gt;
&lt;br /&gt;
During this time the following changes happened:&lt;br /&gt;
&lt;br /&gt;
1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:&lt;br /&gt;
* cpu2023 (temporary)&lt;br /&gt;
* Parallel&lt;br /&gt;
* Theia/Synergy/cpu2017-bf05&lt;br /&gt;
* Single&lt;br /&gt;
Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.&lt;br /&gt;
&lt;br /&gt;
2. A component of the NetApp filer was replaced successfully.&lt;br /&gt;
&lt;br /&gt;
3. The compute node operating was updated to Rocky Linux 8.10.&lt;br /&gt;
&lt;br /&gt;
4. The Slurm scheduling system was upgraded.&lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.calgary.ca with any issues or concerns.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address down&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca Unavailable&lt;br /&gt;
&lt;br /&gt;
Please be informed that our support email address (support@hpc.ucalgary.ca) for RCS is currently not working. We are working to bring it back as soon as possible. Please keep an eye on this space for updates. The clusters are working normally, but support will not receive your messages at this time. We will begin responding as soon as we can get it back. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
[[Category:ARC]]&lt;br /&gt;
{{Navbox ARC}}&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=TALC_Cluster_Status&amp;diff=3745</id>
		<title>TALC Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=TALC_Cluster_Status&amp;diff=3745"/>
		<updated>2025-03-07T20:08:52Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{TALC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ May System Updates&lt;br /&gt;
| date = 2023/02/02&lt;br /&gt;
| message =&lt;br /&gt;
Beginning May 1, 2023, the TALC cluster will undergo operating system updates. The upgrade will happen after the end of term to minimize any disruption. Any existing jobs may be &lt;br /&gt;
temporarily held from scheduling. The upgrade is planned to be fully complete by May 5.&lt;br /&gt;
&lt;br /&gt;
The TALC login node will reboot on the morning of May 1.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = May System Updates Completed&lt;br /&gt;
| date = 2023/05/04&lt;br /&gt;
| message =&lt;br /&gt;
TALC upgrades have been completed. If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Power Interruption&lt;br /&gt;
| date = 2024/05/07&lt;br /&gt;
| message = TALC Experienced an brief power outage around 11AM May 7, 2024.&lt;br /&gt;
Most compute nodes have or are rebooting.  Most jobs running at this time &lt;br /&gt;
were lost. Administrators are actively working on restarting compute &lt;br /&gt;
nodes. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/06/26&lt;br /&gt;
| message = &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address down&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca Unavailable&lt;br /&gt;
&lt;br /&gt;
Please be informed that our support email address (support@hpc.ucalgary.ca) for RCS is currently not working. We are working to bring it back as soon as possible. Please keep an eye on this space for updates. The clusters are working normally, but support will not receive your messages at this time. We will begin responding as soon as we can get it back. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
[[Category:TALC]]&lt;br /&gt;
{{Navbox TALC}}&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=GLaDOS_Cluster_Status&amp;diff=3744</id>
		<title>GLaDOS Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=GLaDOS_Cluster_Status&amp;diff=3744"/>
		<updated>2025-03-07T20:08:31Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{GLaDOS Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ January System Updates&lt;br /&gt;
| date = 2023/01/01&lt;br /&gt;
| message =&lt;br /&gt;
Beginning January 30, 2023, the GLaDOS cluster will undergo operating system updates. We shall do our utmost to minimize disruption and allow ongoing jobs to be completed. New jobs may be temporarily held from scheduling.&lt;br /&gt;
&lt;br /&gt;
The GLaDOS login node will reboot on the morning of January 30. Please save your work and log out if possible.&lt;br /&gt;
&lt;br /&gt;
The upgrade is planned to be fully complete by February 3.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = System Updates Completed&lt;br /&gt;
| date = 2023/01/31&lt;br /&gt;
| message =&lt;br /&gt;
The upgrade has been completed. The following has been changed:&lt;br /&gt;
* OS Updated to Rocky Linux 8.7&lt;br /&gt;
* Slurm updated to 22.05.7&lt;br /&gt;
* Apptainer replaces Singularity&lt;br /&gt;
* Each job will have its own /tmp, /dev/shm, /run/user/$uid mounted&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Apptainer (Singularity) on GLaDOS Login Node&lt;br /&gt;
| date = 2023/06/23&lt;br /&gt;
| message =&lt;br /&gt;
Apptainer (Singularity) containers may experience an error when&lt;br /&gt;
running on the GLaDOS login node. If apptainer complains that a system&lt;br /&gt;
administrator needs to enable user namespaces, simply run your&lt;br /&gt;
containers inside a job.&lt;br /&gt;
&lt;br /&gt;
This is a temporary measure due to security vulnerability that will be&lt;br /&gt;
patched soon.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GLaDOS Scheduled Temporary Shutdown for Move&lt;br /&gt;
| date = 2023/06/26&lt;br /&gt;
| message =&lt;br /&gt;
GLaDOS is scheduled to be shut down temporarily to allow for the &lt;br /&gt;
cluster to be physically moved beginning Tuesday September 5, 2023&lt;br /&gt;
The cluster is expected to be down the rest of the week and back &lt;br /&gt;
online on or before Monday the 11th.&lt;br /&gt;
&lt;br /&gt;
Please send any questions or concerns to support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GLaDOS Move Complete&lt;br /&gt;
| date = 2023/09/11&lt;br /&gt;
| message =&lt;br /&gt;
GLaDOS has been moved and jobs can be submitted for scheduling.&lt;br /&gt;
&lt;br /&gt;
Please send any questions or concerns to support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Power Interruption&lt;br /&gt;
| date = 2024/05/07&lt;br /&gt;
| message = Glados Experienced an brief power outage around 11AM May 7, 2024.&lt;br /&gt;
Most compute nodes have or are rebooting.  Most jobs running at this time &lt;br /&gt;
were lost. Administrators are actively working on restarting compute &lt;br /&gt;
nodes. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = OS updates complete&lt;br /&gt;
| date = 2024/05/07&lt;br /&gt;
| message = Glados has been updated to Rocky Linux 8.10 and is operating normally&lt;br /&gt;
&lt;br /&gt;
Please reach out with any questions or concerns to support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address down&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca Unavailable&lt;br /&gt;
&lt;br /&gt;
Please be informed that our support email address (support@hpc.ucalgary.ca) for RCS is currently not working. We are working to bring it back as soon as possible. Please keep an eye on this space for updates. The clusters are working normally, but support will not receive your messages at this time. We will begin responding as soon as we can get it back. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
[[Category:GLaDOS]]&lt;br /&gt;
{{Navbox GLaDOS}}&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=MARC_Cluster_Status&amp;diff=3743</id>
		<title>MARC Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=MARC_Cluster_Status&amp;diff=3743"/>
		<updated>2025-03-07T20:08:12Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Cluster Status&lt;br /&gt;
| cluster = MARC&lt;br /&gt;
| status = green&lt;br /&gt;
| title = Cluster operational&lt;br /&gt;
| message = Upgrades are planned for Jan 20 2025. Please contact us if you experience system issues.&lt;br /&gt;
See the [[MARC Cluster Status]] page for system notices. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ January System Updates&lt;br /&gt;
| date = 2023/01/01&lt;br /&gt;
| message =&lt;br /&gt;
Beginning January 23, 2023, the MARC cluster will undergo operating system updates. We shall do our utmost to minimize disruption and allow ongoing jobs to be completed. New jobs may be temporarily held from scheduling.&lt;br /&gt;
&lt;br /&gt;
The MARC login node will reboot on the morning of January 23. Please save your work and log out if possible.&lt;br /&gt;
&lt;br /&gt;
The upgrade is planned to be fully complete by January 27.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = System Updates Completed&lt;br /&gt;
| date = 2023/01/24&lt;br /&gt;
| message =&lt;br /&gt;
The upgrade has been completed. The following has been changed:&lt;br /&gt;
* OS Updated to Rocky Linux 8.7&lt;br /&gt;
* Slurm updated to 22.05.7&lt;br /&gt;
* Apptainer replaces Singularity&lt;br /&gt;
* Each job will have its own /tmp, /dev/shm, /run/user/$uid mounted&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Apptainer (Singularity) on MARC Login Node&lt;br /&gt;
| date = 2023/06/23&lt;br /&gt;
| message =&lt;br /&gt;
Apptainer (Singularity) containers may experience an error when&lt;br /&gt;
running on the MARC login node. If apptainer complains that a system&lt;br /&gt;
administrator needs to enable user namespaces, simply run your&lt;br /&gt;
containers inside a job.&lt;br /&gt;
&lt;br /&gt;
This is a temporary measure due to security vulnerability that will be&lt;br /&gt;
patched soon.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Storage Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2023/10/23&lt;br /&gt;
| message =&lt;br /&gt;
We will be performing storage upgrades on the MARC/ARC cluster on &lt;br /&gt;
November 16 and 17, 2023. To facilitate this, we will be throttling &lt;br /&gt;
down the number of jobs on both clusters while the upgrades are &lt;br /&gt;
performed&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = OS Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2024/09/11&lt;br /&gt;
| message =&lt;br /&gt;
MARC will be going down for OS upgrades on 2024/Sep/16. The cluster &lt;br /&gt;
will be unavailable temporarily to complete this work. Please contact&lt;br /&gt;
support@hpc.ucalgary.ca if you have any questions or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance &lt;br /&gt;
| date = 2024/12/11&lt;br /&gt;
| message = The MARC login node will be rebooted on Tuesday December 17 for scheduled maintenance. It will be down for a few minutes and return shortly. Job scheduling and jobs running on the cluster will not be affected. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/07&lt;br /&gt;
| message = The MARC cluster will be rebooted for OS updates on Monday January 20, 2025. Please make sure to save your work and log out before the reboot happens. Scheduling will be paused until the cluster is back, but queued jobs will remain in the queue and nodes will start scheduling when the cluster is ready. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address down&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca Unavailable&lt;br /&gt;
&lt;br /&gt;
Please be informed that our support email address (support@hpc.ucalgary.ca) for RCS is currently not working. We are working to bring it back as soon as possible. Please keep an eye on this space for updates. The clusters are working normally, but support will not receive your messages at this time. We will begin responding as soon as we can get it back. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=3742</id>
		<title>ARC Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=3742"/>
		<updated>2025-03-07T20:07:48Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Cluster Status&lt;br /&gt;
| cluster = ARC&lt;br /&gt;
| status = green&lt;br /&gt;
| title = Cluster operational - Power Bump Jan 18&lt;br /&gt;
| message = System is operational. Updates are planned for Jan 20. Please see MOTD&lt;br /&gt;
&lt;br /&gt;
See the [[ARC Cluster Status]] page for system notices. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = January System Updates&lt;br /&gt;
| date = 2023/01/01&lt;br /&gt;
| message =&lt;br /&gt;
Beginning January 16, 2023, the ARC cluster will undergo operating system updates. We shall do our utmost to minimize disruption and allow ongoing jobs to be completed. New jobs may be temporarily held from scheduling.&lt;br /&gt;
&lt;br /&gt;
The ARC login node will reboot on the morning of January 16. Please save your work and log out if possible.&lt;br /&gt;
&lt;br /&gt;
The upgrade is planned to be fully complete by January 20.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = System Updates Completed&lt;br /&gt;
| date = 2023/01/24&lt;br /&gt;
| message =&lt;br /&gt;
The upgrade has been completed. The following has been changed:&lt;br /&gt;
* OS Updated to Rocky Linux 8.7&lt;br /&gt;
* Slurm updated to 22.05.7&lt;br /&gt;
* Apptainer replaces Singularity&lt;br /&gt;
* Each job will have its own /tmp, /dev/shm, /run/user/$uid mounted&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/02/28&lt;br /&gt;
| message =&lt;br /&gt;
We are currently investigating a filesystem issue that is causing filesystem slowdowns across ARC.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/03/1&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns across ARC. Some jobs on ARC have been paused to help us find the root cause of the slowdowns.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ARC Login node reboot&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
The ARC login node will be rebooted this afternoon for an emergency maintenance. This downtime is needed to help mitigate the filesystem slowdowns experienced on the login node.  Jobs will continue running and scheduling during this time.&lt;br /&gt;
&lt;br /&gt;
All logins to the ARC login node will be terminated at 3:00PM and will remain unavailable until 4:00PM.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Filesystem Issues&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns on specific nodes in our MSRDC location.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues Resolved&lt;br /&gt;
| date = 2023/03/10&lt;br /&gt;
| message =&lt;br /&gt;
We have upgraded the filesystem routers in our MSRDC location to address the performance issues.&lt;br /&gt;
&lt;br /&gt;
Please let us know if you experience any issues with the filesystem performance.&lt;br /&gt;
&lt;br /&gt;
Thank-you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/05/01&lt;br /&gt;
| message =&lt;br /&gt;
On May 1, 2023, the ARC Open OnDemand node will be rebooted between 5PM and 6PM. Expected downtime will be approximately 15 minutes.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Apptainer (Singularity) on ARC Login Node&lt;br /&gt;
| date = 2023/06/22&lt;br /&gt;
| message =&lt;br /&gt;
Apptainer (Singularity) containers may experience an error when&lt;br /&gt;
running on the Arc login node. If apptainer complains that a system&lt;br /&gt;
administrator needs to enable user namespaces, simply run your&lt;br /&gt;
containers inside a job.&lt;br /&gt;
&lt;br /&gt;
This is a temporary measure due to security vulnerability that will be&lt;br /&gt;
patched soon.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Lattice, Single, cpu2013 partition changes&lt;br /&gt;
| date = 2023/07/13&lt;br /&gt;
| message =&lt;br /&gt;
The Lattice and Single, and cpu2013 have all been decomissioned.  The Single&lt;br /&gt;
partition will be replaced by the nodes formerly in the cpu2013 partition but&lt;br /&gt;
will be called single.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/10/17&lt;br /&gt;
| message =&lt;br /&gt;
Open OnDemand will be rebooted on October 17, 2023 for an update. It will be down for up to 30 minutes.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Storage Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2023/10/23&lt;br /&gt;
| message =&lt;br /&gt;
We will be performing storage upgrades on the MARC/ARC cluster on &lt;br /&gt;
November 16 and 17, 2023. To facilitate this, we will be throttling &lt;br /&gt;
down the number of jobs on both clusters while the upgrades are &lt;br /&gt;
performed&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/05/3&lt;br /&gt;
| message =&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Power Interruption&lt;br /&gt;
| date = 2024/05/07&lt;br /&gt;
| message = Arc Experienced an brief power outage around 11AM May 7, 2024.&lt;br /&gt;
Most compute nodes have or are rebooting.  Most jobs running at this time &lt;br /&gt;
were lost. Arc administrators are actively working on restarting compute &lt;br /&gt;
nodes. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation&lt;br /&gt;
| date = 2024/06/03&lt;br /&gt;
| message = Job submissions targeted to the  GPU a100 partition will be &lt;br /&gt;
affected by a temporary reservation on the nodes to accommodate the RCS&lt;br /&gt;
summer school class taking place on 2024/Jun/10. Reservation will end &lt;br /&gt;
shortly after. Please submit your jobs normally and the scheduler will &lt;br /&gt;
start them as soon as the nodes are available. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation Removed&lt;br /&gt;
| date = 2024/06/11&lt;br /&gt;
| message = GPU a100 Nodes in ARC have been returned to normal scheduling. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Notice of Upcoming Partial Outage&lt;br /&gt;
| date = 2024/08/23&lt;br /&gt;
| message = Several compute nodes from the ARC cluster will be unavailable &lt;br /&gt;
between Sept 23 to Sept 27 inclusive (subject to change). All compute nodes &lt;br /&gt;
in cpu2019, cpu2021/2, gpu-v100 most nodes from bigmem and gpu-a100 will be &lt;br /&gt;
affected. These nodes will return to service as soon as the work is complete.  &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update I&lt;br /&gt;
| date = 2024/09/25&lt;br /&gt;
| message = Due to hardware issues that is blocking our original maintenance window, most compute nodes that were taken offline on Monday has been brought back online today. An additional partial outage will occur again starting next Tuesday for the same nodes.&lt;br /&gt;
&lt;br /&gt;
On Tuesday, October 1, 2024, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until Friday October 4, 2024.&lt;br /&gt;
&lt;br /&gt;
We apologise for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update II&lt;br /&gt;
| date = 2024/10/04&lt;br /&gt;
| message = The maintenance window will be extended until at least Monday, October 7, 2024 due to a power distribution issue in our renovated data centre.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Monday, October 7, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update III&lt;br /&gt;
| date = 2024/10/07&lt;br /&gt;
| message = Due to technical issues beyond our control the maintenance window will be extended until at least Tuesday, October 15, 2024.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Tuesday, October 15, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Normal Scheduling has resumed. &lt;br /&gt;
| date = 2024/10/08&lt;br /&gt;
| message = The ARC cluster has been successfully brought online and nodes are running jobs normally. We apologize for the extended downtime. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance &lt;br /&gt;
| date = 2024/12/11&lt;br /&gt;
| message = The ARC login node will be rebooted on Tuesday December 17 for scheduled maintenance. It will be down for a few minutes and return shortly. Job scheduling and jobs running on the cluster will not be affected. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/07&lt;br /&gt;
| message = The ARC cluster will be rebooted for OS updates on Monday January 20, 2025. Please make sure to save your work and log out before the reboot happens. Scheduling will be paused until the cluster is back, but queued jobs will remain in the queue and nodes will start scheduling when the cluster is ready. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/15&lt;br /&gt;
| message = The ARC cluster will be down for maintenance and upgrades starting 9AM Monday, January 20, 2025 through Wednesday, January 22, 2025. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For the duration of the upgrade window:&lt;br /&gt;
* Scheduling will be paused and new jobs will be queued. Any queued jobs will start scheduling only after the upgrade is complete.&lt;br /&gt;
* Access to files via the login node and arc-dtn will generally be available but intermittent. File transfers on the DTN node, including Globus file transfers, may be interrupted during this window.&lt;br /&gt;
&lt;br /&gt;
Please make sure to save your work prior to this outage window to avoid any loss of work.&lt;br /&gt;
&lt;br /&gt;
During this time the following changes will happen:&lt;br /&gt;
&lt;br /&gt;
1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:&lt;br /&gt;
* cpu2023 (temporary)&lt;br /&gt;
* Parallel&lt;br /&gt;
* Theia/Synergy/cpu2017-bf05&lt;br /&gt;
* Single&lt;br /&gt;
Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.&lt;br /&gt;
&lt;br /&gt;
2. A component of the NetApp filer will be replaced. Access to /bulk will be unavailable on Wednesday, January 22, 2025.&lt;br /&gt;
&lt;br /&gt;
3. The compute node operating system will be updated to Rocky Linux 8.10.&lt;br /&gt;
&lt;br /&gt;
4. The Slurm scheduling system will be upgraded.&lt;br /&gt;
&lt;br /&gt;
5. The Open OnDemand web portal will be upgraded.&lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.calgary.ca with any issues or concerns.&lt;br /&gt;
&lt;br /&gt;
⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️&lt;br /&gt;
Update Jan 18, 2025&lt;br /&gt;
&lt;br /&gt;
Around 10AM Arc experienced an electrical power brownout.  Some percentage (how many is unknown at this time) of the nodes lost electrical power during this time causing a loss of a number of running jobs.  &lt;br /&gt;
&lt;br /&gt;
Sorry for the inconvenience.  &lt;br /&gt;
&lt;br /&gt;
Since Arc is shutting down for maintenance on Monday Jan 20, replacement jobs will likely not start unless they request a timelimit less than the time until 8AM Monday.  &lt;br /&gt;
⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Maintenance Complete&lt;br /&gt;
| date = 2025/01/22&lt;br /&gt;
| message = The ARC cluster upgrade is complete&lt;br /&gt;
&lt;br /&gt;
During this time the following changes happened:&lt;br /&gt;
&lt;br /&gt;
1. Ethernet will replace the 11 year old, unsupported Infiniband on the following partitions:&lt;br /&gt;
* cpu2023 (temporary)&lt;br /&gt;
* Parallel&lt;br /&gt;
* Theia/Synergy/cpu2017-bf05&lt;br /&gt;
* Single&lt;br /&gt;
Any multi-node jobs (MPI) running on these partitions will have increased latency going forward. If you run multi-node jobs, make sure to run on a partition such as cpu2019, cpu2021, cpu2022.&lt;br /&gt;
&lt;br /&gt;
2. A component of the NetApp filer was replaced successfully.&lt;br /&gt;
&lt;br /&gt;
3. The compute node operating was updated to Rocky Linux 8.10.&lt;br /&gt;
&lt;br /&gt;
4. The Slurm scheduling system was upgraded.&lt;br /&gt;
&lt;br /&gt;
5. The Open OnDemand web portal was upgraded.&lt;br /&gt;
&lt;br /&gt;
6. The Parallel partition was renamed to Legacy to show the lack of an interconnect for parallel MPI work and was restricted to maximum 4 node jobs.&lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.calgary.ca with any issues or concerns.&lt;br /&gt;
&lt;br /&gt;
Jan 23, 9:08AM&lt;br /&gt;
&lt;br /&gt;
Remount complete.  arc is back in full service.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Module Command Upgrade&lt;br /&gt;
| date = 2025/02/03&lt;br /&gt;
| message = Upgrade of the module command&lt;br /&gt;
&lt;br /&gt;
On Tuesday, February 11, 2025 the module command will be upgraded to a new verson on Arc. This should result in new capabilities and a slightly different visual experience when using the module command.  Loading modules is not expected to change.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Module Command Upgraded&lt;br /&gt;
| date = 2025/02/12&lt;br /&gt;
| message = The module command was upgraded&lt;br /&gt;
&lt;br /&gt;
On Tuesday, February 12, 2025 the module command was upgraded to a new verson on Arc. This should result in new capabilities and a slightly different visual experience when using the module command.  Loading modules should not have changed.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Support email address down&lt;br /&gt;
| date = 2025/03/07&lt;br /&gt;
| message = support@hpc.ucalgary.ca Unavailable&lt;br /&gt;
&lt;br /&gt;
Please be informed that our support email address (support@hpc.ucalgary.ca) for RCS is currently not working. We are working to bring it back as soon as possible. Please keep an eye on this space for updates. The clusters are working normally, but support will not receive your messages at this time. We will begin responding as soon as we can get it back. &lt;br /&gt;
&lt;br /&gt;
Apologies for the inconvenience.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Navbox ARC}}&lt;br /&gt;
[[Category:ARC]]&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=GLaDOS_Cluster_Status&amp;diff=3680</id>
		<title>GLaDOS Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=GLaDOS_Cluster_Status&amp;diff=3680"/>
		<updated>2025-01-08T21:36:28Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{GLaDOS Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ January System Updates&lt;br /&gt;
| date = 2023/01/01&lt;br /&gt;
| message =&lt;br /&gt;
Beginning January 30, 2023, the GLaDOS cluster will undergo operating system updates. We shall do our utmost to minimize disruption and allow ongoing jobs to be completed. New jobs may be temporarily held from scheduling.&lt;br /&gt;
&lt;br /&gt;
The GLaDOS login node will reboot on the morning of January 30. Please save your work and log out if possible.&lt;br /&gt;
&lt;br /&gt;
The upgrade is planned to be fully complete by February 3.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = System Updates Completed&lt;br /&gt;
| date = 2023/01/31&lt;br /&gt;
| message =&lt;br /&gt;
The upgrade has been completed. The following has been changed:&lt;br /&gt;
* OS Updated to Rocky Linux 8.7&lt;br /&gt;
* Slurm updated to 22.05.7&lt;br /&gt;
* Apptainer replaces Singularity&lt;br /&gt;
* Each job will have its own /tmp, /dev/shm, /run/user/$uid mounted&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Apptainer (Singularity) on GLaDOS Login Node&lt;br /&gt;
| date = 2023/06/23&lt;br /&gt;
| message =&lt;br /&gt;
Apptainer (Singularity) containers may experience an error when&lt;br /&gt;
running on the GLaDOS login node. If apptainer complains that a system&lt;br /&gt;
administrator needs to enable user namespaces, simply run your&lt;br /&gt;
containers inside a job.&lt;br /&gt;
&lt;br /&gt;
This is a temporary measure due to security vulnerability that will be&lt;br /&gt;
patched soon.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GLaDOS Scheduled Temporary Shutdown for Move&lt;br /&gt;
| date = 2023/06/26&lt;br /&gt;
| message =&lt;br /&gt;
GLaDOS is scheduled to be shut down temporarily to allow for the &lt;br /&gt;
cluster to be physically moved beginning Tuesday September 5, 2023&lt;br /&gt;
The cluster is expected to be down the rest of the week and back &lt;br /&gt;
online on or before Monday the 11th.&lt;br /&gt;
&lt;br /&gt;
Please send any questions or concerns to support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GLaDOS Move Complete&lt;br /&gt;
| date = 2023/09/11&lt;br /&gt;
| message =&lt;br /&gt;
GLaDOS has been moved and jobs can be submitted for scheduling.&lt;br /&gt;
&lt;br /&gt;
Please send any questions or concerns to support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Power Interruption&lt;br /&gt;
| date = 2024/05/07&lt;br /&gt;
| message = Glados Experienced an brief power outage around 11AM May 7, 2024.&lt;br /&gt;
Most compute nodes have or are rebooting.  Most jobs running at this time &lt;br /&gt;
were lost. Administrators are actively working on restarting compute &lt;br /&gt;
nodes. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = OS updates complete&lt;br /&gt;
| date = 2024/05/07&lt;br /&gt;
| message = Glados has been updated to Rocky Linux 8.10 and is operating normally&lt;br /&gt;
&lt;br /&gt;
Please reach out with any questions or concerns to support@hpc.ucalgary.ca&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
[[Category:GLaDOS]]&lt;br /&gt;
{{Navbox GLaDOS}}&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=Think_Login_Node_Status&amp;diff=3676</id>
		<title>Think Login Node Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=Think_Login_Node_Status&amp;diff=3676"/>
		<updated>2025-01-07T22:39:24Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{ARC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/09/03&lt;br /&gt;
| message =&lt;br /&gt;
The ARC Cluster and the Think login node is operational. No upcoming upgrades are planned.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Notice of Upcoming Partial Outage&lt;br /&gt;
| date = 2024/08/23&lt;br /&gt;
| message = Several compute nodes from the ARC cluster will be unavailable &lt;br /&gt;
between Sept 23 to Sept 27 inclusive (subject to change). Some Think GPU nodes will be affected during this maintenance window. These nodes will return to service as soon as the work is complete.  &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update I&lt;br /&gt;
| date = 2024/09/25&lt;br /&gt;
| message = Due to hardware issues that is blocking our original maintenance window, most compute nodes that were taken offline on Monday has been brought back online today. An additional partial outage will occur again starting next Tuesday for the same nodes.&lt;br /&gt;
&lt;br /&gt;
On Tuesday, October 1, 2024, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until Friday October 4, 2024.&lt;br /&gt;
&lt;br /&gt;
We apologise for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update II&lt;br /&gt;
| date = 2024/10/04&lt;br /&gt;
| message = The maintenance window will be extended until at least Monday, October 7, 2024 due to a power distribution issue in our renovated data centre.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Monday, October 7, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update III&lt;br /&gt;
| date = 2024/10/07&lt;br /&gt;
| message = Due to technical issues beyond our control the maintenance window will be extended until at least Tuesday, October 15, 2024.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Tuesday, October 15, 2024. Affected WDF GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Normal Scheduling has resumed. &lt;br /&gt;
| date = 2024/10/08&lt;br /&gt;
| message = The ARC cluster has been successfully brought online and nodes are running jobs normally. We apologize for the extended downtime. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = wdfgpu[1-12] System Update Reboots &lt;br /&gt;
| date = 2024/12/02&lt;br /&gt;
| message = wdfgpu[1-12] will be updated today for a short reboot to install important system updates and will return shortly. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/07&lt;br /&gt;
| message = The ARC cluster will be rebooted for OS updates on Monday January 20, 2025. Please make sure to save your work and log out before the reboot happens. Scheduling will be paused until the cluster is back, but queued jobs will remain in the queue and nodes will start scheduling when the cluster is ready. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
[[Category:ARC]]&lt;br /&gt;
{{Navbox ARC}}&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=Altis_Login_Node_Status&amp;diff=3675</id>
		<title>Altis Login Node Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=Altis_Login_Node_Status&amp;diff=3675"/>
		<updated>2025-01-07T22:38:11Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{ARC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/09/03&lt;br /&gt;
| message =&lt;br /&gt;
The ARC Cluster and the Altis login node is operational. No upcoming upgrades are planned.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Notice of Upcoming Partial Outage&lt;br /&gt;
| date = 2024/08/27&lt;br /&gt;
| message = Several compute nodes from the ARC cluster will be unavailable &lt;br /&gt;
between Sept 23 to Sept 27 inclusive (subject to change). Some Altis GPU nodes will be affected during this maintenance window. These nodes will return to service as soon as the work is complete.  &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update I&lt;br /&gt;
| date = 2024/09/25&lt;br /&gt;
| message = Due to hardware issues that is blocking our original maintenance window, most compute nodes that were taken offline on Monday has been brought back online today. An additional partial outage will occur again starting next Tuesday for the same nodes.&lt;br /&gt;
&lt;br /&gt;
On Tuesday, October 1, 2024, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until Friday October 4, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologise for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update II&lt;br /&gt;
| date = 2024/10/04&lt;br /&gt;
| message = The maintenance window will be extended until at least Monday, October 7, 2024 due to a power distribution issue in our renovated data centre.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Monday, October 7, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update III&lt;br /&gt;
| date = 2024/10/07&lt;br /&gt;
| message = Due to technical issues beyond our control the maintenance window will be extended until at least Tuesday, October 15, 2024.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Tuesday, October 15, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Normal Scheduling has resumed. &lt;br /&gt;
| date = 2024/10/08&lt;br /&gt;
| message = The ARC cluster has been successfully brought online and nodes are running jobs normally. We apologize for the extended downtime. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = wdfgpu[1-12] System Update Reboots &lt;br /&gt;
| date = 2024/12/02&lt;br /&gt;
| message = wdfgpu[1-12] will be updated today for a short reboot to install important system updates and will return shortly. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/07&lt;br /&gt;
| message = The ARC cluster will be rebooted for OS updates on Monday January 20, 2025. Please make sure to save your work and log out before the reboot happens. Scheduling will be paused until the cluster is back, but queued jobs will remain in the queue and nodes will start scheduling when the cluster is ready. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
[[Category:ARC]]&lt;br /&gt;
{{Navbox ARC}}&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=3674</id>
		<title>ARC Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=3674"/>
		<updated>2025-01-07T22:22:19Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Cluster Status&lt;br /&gt;
| cluster = ARC&lt;br /&gt;
| status = green&lt;br /&gt;
| title = Cluster operational&lt;br /&gt;
| message = System is operational. Updates are planned for Jan 20. Please see MOTD&lt;br /&gt;
&lt;br /&gt;
See the [[ARC Cluster Status]] page for system notices. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = January System Updates&lt;br /&gt;
| date = 2023/01/01&lt;br /&gt;
| message =&lt;br /&gt;
Beginning January 16, 2023, the ARC cluster will undergo operating system updates. We shall do our utmost to minimize disruption and allow ongoing jobs to be completed. New jobs may be temporarily held from scheduling.&lt;br /&gt;
&lt;br /&gt;
The ARC login node will reboot on the morning of January 16. Please save your work and log out if possible.&lt;br /&gt;
&lt;br /&gt;
The upgrade is planned to be fully complete by January 20.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = System Updates Completed&lt;br /&gt;
| date = 2023/01/24&lt;br /&gt;
| message =&lt;br /&gt;
The upgrade has been completed. The following has been changed:&lt;br /&gt;
* OS Updated to Rocky Linux 8.7&lt;br /&gt;
* Slurm updated to 22.05.7&lt;br /&gt;
* Apptainer replaces Singularity&lt;br /&gt;
* Each job will have its own /tmp, /dev/shm, /run/user/$uid mounted&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/02/28&lt;br /&gt;
| message =&lt;br /&gt;
We are currently investigating a filesystem issue that is causing filesystem slowdowns across ARC.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/03/1&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns across ARC. Some jobs on ARC have been paused to help us find the root cause of the slowdowns.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ARC Login node reboot&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
The ARC login node will be rebooted this afternoon for an emergency maintenance. This downtime is needed to help mitigate the filesystem slowdowns experienced on the login node.  Jobs will continue running and scheduling during this time.&lt;br /&gt;
&lt;br /&gt;
All logins to the ARC login node will be terminated at 3:00PM and will remain unavailable until 4:00PM.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Filesystem Issues&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns on specific nodes in our MSRDC location.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues Resolved&lt;br /&gt;
| date = 2023/03/10&lt;br /&gt;
| message =&lt;br /&gt;
We have upgraded the filesystem routers in our MSRDC location to address the performance issues.&lt;br /&gt;
&lt;br /&gt;
Please let us know if you experience any issues with the filesystem performance.&lt;br /&gt;
&lt;br /&gt;
Thank-you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/05/01&lt;br /&gt;
| message =&lt;br /&gt;
On May 1, 2023, the ARC Open OnDemand node will be rebooted between 5PM and 6PM. Expected downtime will be approximately 15 minutes.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Apptainer (Singularity) on ARC Login Node&lt;br /&gt;
| date = 2023/06/22&lt;br /&gt;
| message =&lt;br /&gt;
Apptainer (Singularity) containers may experience an error when&lt;br /&gt;
running on the Arc login node. If apptainer complains that a system&lt;br /&gt;
administrator needs to enable user namespaces, simply run your&lt;br /&gt;
containers inside a job.&lt;br /&gt;
&lt;br /&gt;
This is a temporary measure due to security vulnerability that will be&lt;br /&gt;
patched soon.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Lattice, Single, cpu2013 partition changes&lt;br /&gt;
| date = 2023/07/13&lt;br /&gt;
| message =&lt;br /&gt;
The Lattice and Single, and cpu2013 have all been decomissioned.  The Single&lt;br /&gt;
partition will be replaced by the nodes formerly in the cpu2013 partition but&lt;br /&gt;
will be called single.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/10/17&lt;br /&gt;
| message =&lt;br /&gt;
Open OnDemand will be rebooted on October 17, 2023 for an update. It will be down for up to 30 minutes.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Storage Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2023/10/23&lt;br /&gt;
| message =&lt;br /&gt;
We will be performing storage upgrades on the MARC/ARC cluster on &lt;br /&gt;
November 16 and 17, 2023. To facilitate this, we will be throttling &lt;br /&gt;
down the number of jobs on both clusters while the upgrades are &lt;br /&gt;
performed&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/05/3&lt;br /&gt;
| message =&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Power Interruption&lt;br /&gt;
| date = 2024/05/07&lt;br /&gt;
| message = Arc Experienced an brief power outage around 11AM May 7, 2024.&lt;br /&gt;
Most compute nodes have or are rebooting.  Most jobs running at this time &lt;br /&gt;
were lost. Arc administrators are actively working on restarting compute &lt;br /&gt;
nodes. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation&lt;br /&gt;
| date = 2024/06/03&lt;br /&gt;
| message = Job submissions targeted to the  GPU a100 partition will be &lt;br /&gt;
affected by a temporary reservation on the nodes to accommodate the RCS&lt;br /&gt;
summer school class taking place on 2024/Jun/10. Reservation will end &lt;br /&gt;
shortly after. Please submit your jobs normally and the scheduler will &lt;br /&gt;
start them as soon as the nodes are available. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation Removed&lt;br /&gt;
| date = 2024/06/11&lt;br /&gt;
| message = GPU a100 Nodes in ARC have been returned to normal scheduling. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Notice of Upcoming Partial Outage&lt;br /&gt;
| date = 2024/08/23&lt;br /&gt;
| message = Several compute nodes from the ARC cluster will be unavailable &lt;br /&gt;
between Sept 23 to Sept 27 inclusive (subject to change). All compute nodes &lt;br /&gt;
in cpu2019, cpu2021/2, gpu-v100 most nodes from bigmem and gpu-a100 will be &lt;br /&gt;
affected. These nodes will return to service as soon as the work is complete.  &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update I&lt;br /&gt;
| date = 2024/09/25&lt;br /&gt;
| message = Due to hardware issues that is blocking our original maintenance window, most compute nodes that were taken offline on Monday has been brought back online today. An additional partial outage will occur again starting next Tuesday for the same nodes.&lt;br /&gt;
&lt;br /&gt;
On Tuesday, October 1, 2024, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until Friday October 4, 2024.&lt;br /&gt;
&lt;br /&gt;
We apologise for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update II&lt;br /&gt;
| date = 2024/10/04&lt;br /&gt;
| message = The maintenance window will be extended until at least Monday, October 7, 2024 due to a power distribution issue in our renovated data centre.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Monday, October 7, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update III&lt;br /&gt;
| date = 2024/10/07&lt;br /&gt;
| message = Due to technical issues beyond our control the maintenance window will be extended until at least Tuesday, October 15, 2024.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Tuesday, October 15, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Normal Scheduling has resumed. &lt;br /&gt;
| date = 2024/10/08&lt;br /&gt;
| message = The ARC cluster has been successfully brought online and nodes are running jobs normally. We apologize for the extended downtime. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance &lt;br /&gt;
| date = 2024/12/11&lt;br /&gt;
| message = The ARC login node will be rebooted on Tuesday December 17 for scheduled maintenance. It will be down for a few minutes and return shortly. Job scheduling and jobs running on the cluster will not be affected. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/07&lt;br /&gt;
| message = The ARC cluster will be rebooted for OS updates on Monday January 20, 2025. Please make sure to save your work and log out before the reboot happens. Scheduling will be paused until the cluster is back, but queued jobs will remain in the queue and nodes will start scheduling when the cluster is ready. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Navbox ARC}}&lt;br /&gt;
[[Category:ARC]]&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=MARC_Cluster_Status&amp;diff=3673</id>
		<title>MARC Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=MARC_Cluster_Status&amp;diff=3673"/>
		<updated>2025-01-07T22:22:13Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Cluster Status&lt;br /&gt;
| cluster = MARC&lt;br /&gt;
| status = green&lt;br /&gt;
| title = Cluster operational&lt;br /&gt;
| message = Upgrades are planned for Jan 20 2025. Please contact us if you experience system issues.&lt;br /&gt;
See the [[MARC Cluster Status]] page for system notices. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ January System Updates&lt;br /&gt;
| date = 2023/01/01&lt;br /&gt;
| message =&lt;br /&gt;
Beginning January 23, 2023, the MARC cluster will undergo operating system updates. We shall do our utmost to minimize disruption and allow ongoing jobs to be completed. New jobs may be temporarily held from scheduling.&lt;br /&gt;
&lt;br /&gt;
The MARC login node will reboot on the morning of January 23. Please save your work and log out if possible.&lt;br /&gt;
&lt;br /&gt;
The upgrade is planned to be fully complete by January 27.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = System Updates Completed&lt;br /&gt;
| date = 2023/01/24&lt;br /&gt;
| message =&lt;br /&gt;
The upgrade has been completed. The following has been changed:&lt;br /&gt;
* OS Updated to Rocky Linux 8.7&lt;br /&gt;
* Slurm updated to 22.05.7&lt;br /&gt;
* Apptainer replaces Singularity&lt;br /&gt;
* Each job will have its own /tmp, /dev/shm, /run/user/$uid mounted&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Apptainer (Singularity) on MARC Login Node&lt;br /&gt;
| date = 2023/06/23&lt;br /&gt;
| message =&lt;br /&gt;
Apptainer (Singularity) containers may experience an error when&lt;br /&gt;
running on the MARC login node. If apptainer complains that a system&lt;br /&gt;
administrator needs to enable user namespaces, simply run your&lt;br /&gt;
containers inside a job.&lt;br /&gt;
&lt;br /&gt;
This is a temporary measure due to security vulnerability that will be&lt;br /&gt;
patched soon.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Storage Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2023/10/23&lt;br /&gt;
| message =&lt;br /&gt;
We will be performing storage upgrades on the MARC/ARC cluster on &lt;br /&gt;
November 16 and 17, 2023. To facilitate this, we will be throttling &lt;br /&gt;
down the number of jobs on both clusters while the upgrades are &lt;br /&gt;
performed&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = OS Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2024/09/11&lt;br /&gt;
| message =&lt;br /&gt;
MARC will be going down for OS upgrades on 2024/Sep/16. The cluster &lt;br /&gt;
will be unavailable temporarily to complete this work. Please contact&lt;br /&gt;
support@hpc.ucalgary.ca if you have any questions or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance &lt;br /&gt;
| date = 2024/12/11&lt;br /&gt;
| message = The MARC login node will be rebooted on Tuesday December 17 for scheduled maintenance. It will be down for a few minutes and return shortly. Job scheduling and jobs running on the cluster will not be affected. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/07&lt;br /&gt;
| message = The MARC cluster will be rebooted for OS updates on Monday January 20, 2025. Please make sure to save your work and log out before the reboot happens. Scheduling will be paused until the cluster is back, but queued jobs will remain in the queue and nodes will start scheduling when the cluster is ready. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=3671</id>
		<title>ARC Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=3671"/>
		<updated>2025-01-07T21:58:56Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{ARC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = January System Updates&lt;br /&gt;
| date = 2023/01/01&lt;br /&gt;
| message =&lt;br /&gt;
Beginning January 16, 2023, the ARC cluster will undergo operating system updates. We shall do our utmost to minimize disruption and allow ongoing jobs to be completed. New jobs may be temporarily held from scheduling.&lt;br /&gt;
&lt;br /&gt;
The ARC login node will reboot on the morning of January 16. Please save your work and log out if possible.&lt;br /&gt;
&lt;br /&gt;
The upgrade is planned to be fully complete by January 20.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = System Updates Completed&lt;br /&gt;
| date = 2023/01/24&lt;br /&gt;
| message =&lt;br /&gt;
The upgrade has been completed. The following has been changed:&lt;br /&gt;
* OS Updated to Rocky Linux 8.7&lt;br /&gt;
* Slurm updated to 22.05.7&lt;br /&gt;
* Apptainer replaces Singularity&lt;br /&gt;
* Each job will have its own /tmp, /dev/shm, /run/user/$uid mounted&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/02/28&lt;br /&gt;
| message =&lt;br /&gt;
We are currently investigating a filesystem issue that is causing filesystem slowdowns across ARC.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/03/1&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns across ARC. Some jobs on ARC have been paused to help us find the root cause of the slowdowns.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ARC Login node reboot&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
The ARC login node will be rebooted this afternoon for an emergency maintenance. This downtime is needed to help mitigate the filesystem slowdowns experienced on the login node.  Jobs will continue running and scheduling during this time.&lt;br /&gt;
&lt;br /&gt;
All logins to the ARC login node will be terminated at 3:00PM and will remain unavailable until 4:00PM.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Filesystem Issues&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns on specific nodes in our MSRDC location.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues Resolved&lt;br /&gt;
| date = 2023/03/10&lt;br /&gt;
| message =&lt;br /&gt;
We have upgraded the filesystem routers in our MSRDC location to address the performance issues.&lt;br /&gt;
&lt;br /&gt;
Please let us know if you experience any issues with the filesystem performance.&lt;br /&gt;
&lt;br /&gt;
Thank-you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/05/01&lt;br /&gt;
| message =&lt;br /&gt;
On May 1, 2023, the ARC Open OnDemand node will be rebooted between 5PM and 6PM. Expected downtime will be approximately 15 minutes.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Apptainer (Singularity) on ARC Login Node&lt;br /&gt;
| date = 2023/06/22&lt;br /&gt;
| message =&lt;br /&gt;
Apptainer (Singularity) containers may experience an error when&lt;br /&gt;
running on the Arc login node. If apptainer complains that a system&lt;br /&gt;
administrator needs to enable user namespaces, simply run your&lt;br /&gt;
containers inside a job.&lt;br /&gt;
&lt;br /&gt;
This is a temporary measure due to security vulnerability that will be&lt;br /&gt;
patched soon.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Lattice, Single, cpu2013 partition changes&lt;br /&gt;
| date = 2023/07/13&lt;br /&gt;
| message =&lt;br /&gt;
The Lattice and Single, and cpu2013 have all been decomissioned.  The Single&lt;br /&gt;
partition will be replaced by the nodes formerly in the cpu2013 partition but&lt;br /&gt;
will be called single.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/10/17&lt;br /&gt;
| message =&lt;br /&gt;
Open OnDemand will be rebooted on October 17, 2023 for an update. It will be down for up to 30 minutes.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Storage Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2023/10/23&lt;br /&gt;
| message =&lt;br /&gt;
We will be performing storage upgrades on the MARC/ARC cluster on &lt;br /&gt;
November 16 and 17, 2023. To facilitate this, we will be throttling &lt;br /&gt;
down the number of jobs on both clusters while the upgrades are &lt;br /&gt;
performed&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/05/3&lt;br /&gt;
| message =&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Power Interruption&lt;br /&gt;
| date = 2024/05/07&lt;br /&gt;
| message = Arc Experienced an brief power outage around 11AM May 7, 2024.&lt;br /&gt;
Most compute nodes have or are rebooting.  Most jobs running at this time &lt;br /&gt;
were lost. Arc administrators are actively working on restarting compute &lt;br /&gt;
nodes. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation&lt;br /&gt;
| date = 2024/06/03&lt;br /&gt;
| message = Job submissions targeted to the  GPU a100 partition will be &lt;br /&gt;
affected by a temporary reservation on the nodes to accommodate the RCS&lt;br /&gt;
summer school class taking place on 2024/Jun/10. Reservation will end &lt;br /&gt;
shortly after. Please submit your jobs normally and the scheduler will &lt;br /&gt;
start them as soon as the nodes are available. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation Removed&lt;br /&gt;
| date = 2024/06/11&lt;br /&gt;
| message = GPU a100 Nodes in ARC have been returned to normal scheduling. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Notice of Upcoming Partial Outage&lt;br /&gt;
| date = 2024/08/23&lt;br /&gt;
| message = Several compute nodes from the ARC cluster will be unavailable &lt;br /&gt;
between Sept 23 to Sept 27 inclusive (subject to change). All compute nodes &lt;br /&gt;
in cpu2019, cpu2021/2, gpu-v100 most nodes from bigmem and gpu-a100 will be &lt;br /&gt;
affected. These nodes will return to service as soon as the work is complete.  &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update I&lt;br /&gt;
| date = 2024/09/25&lt;br /&gt;
| message = Due to hardware issues that is blocking our original maintenance window, most compute nodes that were taken offline on Monday has been brought back online today. An additional partial outage will occur again starting next Tuesday for the same nodes.&lt;br /&gt;
&lt;br /&gt;
On Tuesday, October 1, 2024, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until Friday October 4, 2024.&lt;br /&gt;
&lt;br /&gt;
We apologise for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update II&lt;br /&gt;
| date = 2024/10/04&lt;br /&gt;
| message = The maintenance window will be extended until at least Monday, October 7, 2024 due to a power distribution issue in our renovated data centre.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Monday, October 7, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update III&lt;br /&gt;
| date = 2024/10/07&lt;br /&gt;
| message = Due to technical issues beyond our control the maintenance window will be extended until at least Tuesday, October 15, 2024.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Tuesday, October 15, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Normal Scheduling has resumed. &lt;br /&gt;
| date = 2024/10/08&lt;br /&gt;
| message = The ARC cluster has been successfully brought online and nodes are running jobs normally. We apologize for the extended downtime. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance &lt;br /&gt;
| date = 2024/12/11&lt;br /&gt;
| message = The ARC login node will be rebooted on Tuesday December 17 for scheduled maintenance. It will be down for a few minutes and return shortly. Job scheduling and jobs running on the cluster will not be affected. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/07&lt;br /&gt;
| message = The ARC cluster will be rebooted for OS updates on Monday January 20, 2025. Please make sure to save your work and log out before the reboot happens. Scheduling will be paused until the cluster is back, but queued jobs will remain in the queue and nodes will start scheduling when the cluster is ready. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Navbox ARC}}&lt;br /&gt;
[[Category:ARC]]&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=MARC_Cluster_Status&amp;diff=3670</id>
		<title>MARC Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=MARC_Cluster_Status&amp;diff=3670"/>
		<updated>2025-01-07T21:58:47Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{MARC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ January System Updates&lt;br /&gt;
| date = 2023/01/01&lt;br /&gt;
| message =&lt;br /&gt;
Beginning January 23, 2023, the MARC cluster will undergo operating system updates. We shall do our utmost to minimize disruption and allow ongoing jobs to be completed. New jobs may be temporarily held from scheduling.&lt;br /&gt;
&lt;br /&gt;
The MARC login node will reboot on the morning of January 23. Please save your work and log out if possible.&lt;br /&gt;
&lt;br /&gt;
The upgrade is planned to be fully complete by January 27.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = System Updates Completed&lt;br /&gt;
| date = 2023/01/24&lt;br /&gt;
| message =&lt;br /&gt;
The upgrade has been completed. The following has been changed:&lt;br /&gt;
* OS Updated to Rocky Linux 8.7&lt;br /&gt;
* Slurm updated to 22.05.7&lt;br /&gt;
* Apptainer replaces Singularity&lt;br /&gt;
* Each job will have its own /tmp, /dev/shm, /run/user/$uid mounted&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Apptainer (Singularity) on MARC Login Node&lt;br /&gt;
| date = 2023/06/23&lt;br /&gt;
| message =&lt;br /&gt;
Apptainer (Singularity) containers may experience an error when&lt;br /&gt;
running on the MARC login node. If apptainer complains that a system&lt;br /&gt;
administrator needs to enable user namespaces, simply run your&lt;br /&gt;
containers inside a job.&lt;br /&gt;
&lt;br /&gt;
This is a temporary measure due to security vulnerability that will be&lt;br /&gt;
patched soon.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Storage Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2023/10/23&lt;br /&gt;
| message =&lt;br /&gt;
We will be performing storage upgrades on the MARC/ARC cluster on &lt;br /&gt;
November 16 and 17, 2023. To facilitate this, we will be throttling &lt;br /&gt;
down the number of jobs on both clusters while the upgrades are &lt;br /&gt;
performed&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = OS Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2024/09/11&lt;br /&gt;
| message =&lt;br /&gt;
MARC will be going down for OS upgrades on 2024/Sep/16. The cluster &lt;br /&gt;
will be unavailable temporarily to complete this work. Please contact&lt;br /&gt;
support@hpc.ucalgary.ca if you have any questions or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance &lt;br /&gt;
| date = 2024/12/11&lt;br /&gt;
| message = The MARC login node will be rebooted on Tuesday December 17 for scheduled maintenance. It will be down for a few minutes and return shortly. Job scheduling and jobs running on the cluster will not be affected. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance and OS Update&lt;br /&gt;
| date = 2025/01/07&lt;br /&gt;
| message = The MARC cluster will be rebooted for OS updates on Monday January 20, 2025. Please make sure to save your work and log out before the reboot happens. Scheduling will be paused until the cluster is back, but queued jobs will remain in the queue and nodes will start scheduling when the cluster is ready. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=MARC_Cluster_Status&amp;diff=3655</id>
		<title>MARC Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=MARC_Cluster_Status&amp;diff=3655"/>
		<updated>2024-12-11T20:16:48Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{MARC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ January System Updates&lt;br /&gt;
| date = 2023/01/01&lt;br /&gt;
| message =&lt;br /&gt;
Beginning January 23, 2023, the MARC cluster will undergo operating system updates. We shall do our utmost to minimize disruption and allow ongoing jobs to be completed. New jobs may be temporarily held from scheduling.&lt;br /&gt;
&lt;br /&gt;
The MARC login node will reboot on the morning of January 23. Please save your work and log out if possible.&lt;br /&gt;
&lt;br /&gt;
The upgrade is planned to be fully complete by January 27.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = System Updates Completed&lt;br /&gt;
| date = 2023/01/24&lt;br /&gt;
| message =&lt;br /&gt;
The upgrade has been completed. The following has been changed:&lt;br /&gt;
* OS Updated to Rocky Linux 8.7&lt;br /&gt;
* Slurm updated to 22.05.7&lt;br /&gt;
* Apptainer replaces Singularity&lt;br /&gt;
* Each job will have its own /tmp, /dev/shm, /run/user/$uid mounted&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Apptainer (Singularity) on MARC Login Node&lt;br /&gt;
| date = 2023/06/23&lt;br /&gt;
| message =&lt;br /&gt;
Apptainer (Singularity) containers may experience an error when&lt;br /&gt;
running on the MARC login node. If apptainer complains that a system&lt;br /&gt;
administrator needs to enable user namespaces, simply run your&lt;br /&gt;
containers inside a job.&lt;br /&gt;
&lt;br /&gt;
This is a temporary measure due to security vulnerability that will be&lt;br /&gt;
patched soon.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Storage Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2023/10/23&lt;br /&gt;
| message =&lt;br /&gt;
We will be performing storage upgrades on the MARC/ARC cluster on &lt;br /&gt;
November 16 and 17, 2023. To facilitate this, we will be throttling &lt;br /&gt;
down the number of jobs on both clusters while the upgrades are &lt;br /&gt;
performed&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = OS Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2024/09/11&lt;br /&gt;
| message =&lt;br /&gt;
MARC will be going down for OS upgrades on 2024/Sep/16. The cluster &lt;br /&gt;
will be unavailable temporarily to complete this work. Please contact&lt;br /&gt;
support@hpc.ucalgary.ca if you have any questions or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance &lt;br /&gt;
| date = 2024/12/11&lt;br /&gt;
| message = The MARC login node will be rebooted on Tuesday December 17 for scheduled maintenance. It will be down for a few minutes and return shortly. Job scheduling and jobs running on the cluster will not be affected. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=3654</id>
		<title>ARC Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=3654"/>
		<updated>2024-12-11T20:16:04Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{ARC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = January System Updates&lt;br /&gt;
| date = 2023/01/01&lt;br /&gt;
| message =&lt;br /&gt;
Beginning January 16, 2023, the ARC cluster will undergo operating system updates. We shall do our utmost to minimize disruption and allow ongoing jobs to be completed. New jobs may be temporarily held from scheduling.&lt;br /&gt;
&lt;br /&gt;
The ARC login node will reboot on the morning of January 16. Please save your work and log out if possible.&lt;br /&gt;
&lt;br /&gt;
The upgrade is planned to be fully complete by January 20.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = System Updates Completed&lt;br /&gt;
| date = 2023/01/24&lt;br /&gt;
| message =&lt;br /&gt;
The upgrade has been completed. The following has been changed:&lt;br /&gt;
* OS Updated to Rocky Linux 8.7&lt;br /&gt;
* Slurm updated to 22.05.7&lt;br /&gt;
* Apptainer replaces Singularity&lt;br /&gt;
* Each job will have its own /tmp, /dev/shm, /run/user/$uid mounted&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/02/28&lt;br /&gt;
| message =&lt;br /&gt;
We are currently investigating a filesystem issue that is causing filesystem slowdowns across ARC.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/03/1&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns across ARC. Some jobs on ARC have been paused to help us find the root cause of the slowdowns.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ARC Login node reboot&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
The ARC login node will be rebooted this afternoon for an emergency maintenance. This downtime is needed to help mitigate the filesystem slowdowns experienced on the login node.  Jobs will continue running and scheduling during this time.&lt;br /&gt;
&lt;br /&gt;
All logins to the ARC login node will be terminated at 3:00PM and will remain unavailable until 4:00PM.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Filesystem Issues&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns on specific nodes in our MSRDC location.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues Resolved&lt;br /&gt;
| date = 2023/03/10&lt;br /&gt;
| message =&lt;br /&gt;
We have upgraded the filesystem routers in our MSRDC location to address the performance issues.&lt;br /&gt;
&lt;br /&gt;
Please let us know if you experience any issues with the filesystem performance.&lt;br /&gt;
&lt;br /&gt;
Thank-you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/05/01&lt;br /&gt;
| message =&lt;br /&gt;
On May 1, 2023, the ARC Open OnDemand node will be rebooted between 5PM and 6PM. Expected downtime will be approximately 15 minutes.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Apptainer (Singularity) on ARC Login Node&lt;br /&gt;
| date = 2023/06/22&lt;br /&gt;
| message =&lt;br /&gt;
Apptainer (Singularity) containers may experience an error when&lt;br /&gt;
running on the Arc login node. If apptainer complains that a system&lt;br /&gt;
administrator needs to enable user namespaces, simply run your&lt;br /&gt;
containers inside a job.&lt;br /&gt;
&lt;br /&gt;
This is a temporary measure due to security vulnerability that will be&lt;br /&gt;
patched soon.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Lattice, Single, cpu2013 partition changes&lt;br /&gt;
| date = 2023/07/13&lt;br /&gt;
| message =&lt;br /&gt;
The Lattice and Single, and cpu2013 have all been decomissioned.  The Single&lt;br /&gt;
partition will be replaced by the nodes formerly in the cpu2013 partition but&lt;br /&gt;
will be called single.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/10/17&lt;br /&gt;
| message =&lt;br /&gt;
Open OnDemand will be rebooted on October 17, 2023 for an update. It will be down for up to 30 minutes.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Storage Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2023/10/23&lt;br /&gt;
| message =&lt;br /&gt;
We will be performing storage upgrades on the MARC/ARC cluster on &lt;br /&gt;
November 16 and 17, 2023. To facilitate this, we will be throttling &lt;br /&gt;
down the number of jobs on both clusters while the upgrades are &lt;br /&gt;
performed&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/05/3&lt;br /&gt;
| message =&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Power Interruption&lt;br /&gt;
| date = 2024/05/07&lt;br /&gt;
| message = Arc Experienced an brief power outage around 11AM May 7, 2024.&lt;br /&gt;
Most compute nodes have or are rebooting.  Most jobs running at this time &lt;br /&gt;
were lost. Arc administrators are actively working on restarting compute &lt;br /&gt;
nodes. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation&lt;br /&gt;
| date = 2024/06/03&lt;br /&gt;
| message = Job submissions targeted to the  GPU a100 partition will be &lt;br /&gt;
affected by a temporary reservation on the nodes to accommodate the RCS&lt;br /&gt;
summer school class taking place on 2024/Jun/10. Reservation will end &lt;br /&gt;
shortly after. Please submit your jobs normally and the scheduler will &lt;br /&gt;
start them as soon as the nodes are available. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation Removed&lt;br /&gt;
| date = 2024/06/11&lt;br /&gt;
| message = GPU a100 Nodes in ARC have been returned to normal scheduling. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Notice of Upcoming Partial Outage&lt;br /&gt;
| date = 2024/08/23&lt;br /&gt;
| message = Several compute nodes from the ARC cluster will be unavailable &lt;br /&gt;
between Sept 23 to Sept 27 inclusive (subject to change). All compute nodes &lt;br /&gt;
in cpu2019, cpu2021/2, gpu-v100 most nodes from bigmem and gpu-a100 will be &lt;br /&gt;
affected. These nodes will return to service as soon as the work is complete.  &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update I&lt;br /&gt;
| date = 2024/09/25&lt;br /&gt;
| message = Due to hardware issues that is blocking our original maintenance window, most compute nodes that were taken offline on Monday has been brought back online today. An additional partial outage will occur again starting next Tuesday for the same nodes.&lt;br /&gt;
&lt;br /&gt;
On Tuesday, October 1, 2024, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until Friday October 4, 2024.&lt;br /&gt;
&lt;br /&gt;
We apologise for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update II&lt;br /&gt;
| date = 2024/10/04&lt;br /&gt;
| message = The maintenance window will be extended until at least Monday, October 7, 2024 due to a power distribution issue in our renovated data centre.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Monday, October 7, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update III&lt;br /&gt;
| date = 2024/10/07&lt;br /&gt;
| message = Due to technical issues beyond our control the maintenance window will be extended until at least Tuesday, October 15, 2024.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Tuesday, October 15, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Normal Scheduling has resumed. &lt;br /&gt;
| date = 2024/10/08&lt;br /&gt;
| message = The ARC cluster has been successfully brought online and nodes are running jobs normally. We apologize for the extended downtime. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Scheduled Maintenance &lt;br /&gt;
| date = 2024/12/11&lt;br /&gt;
| message = The ARC login node will be rebooted on Tuesday December 17 for scheduled maintenance. It will be down for a few minutes and return shortly. Job scheduling and jobs running on the cluster will not be affected. Thank you for understanding. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Navbox ARC}}&lt;br /&gt;
[[Category:ARC]]&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=Think_Login_Node_Status&amp;diff=3651</id>
		<title>Think Login Node Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=Think_Login_Node_Status&amp;diff=3651"/>
		<updated>2024-12-02T20:11:47Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{ARC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/09/03&lt;br /&gt;
| message =&lt;br /&gt;
The ARC Cluster and the Think login node is operational. No upcoming upgrades are planned.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Notice of Upcoming Partial Outage&lt;br /&gt;
| date = 2024/08/23&lt;br /&gt;
| message = Several compute nodes from the ARC cluster will be unavailable &lt;br /&gt;
between Sept 23 to Sept 27 inclusive (subject to change). Some Think GPU nodes will be affected during this maintenance window. These nodes will return to service as soon as the work is complete.  &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update I&lt;br /&gt;
| date = 2024/09/25&lt;br /&gt;
| message = Due to hardware issues that is blocking our original maintenance window, most compute nodes that were taken offline on Monday has been brought back online today. An additional partial outage will occur again starting next Tuesday for the same nodes.&lt;br /&gt;
&lt;br /&gt;
On Tuesday, October 1, 2024, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until Friday October 4, 2024.&lt;br /&gt;
&lt;br /&gt;
We apologise for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update II&lt;br /&gt;
| date = 2024/10/04&lt;br /&gt;
| message = The maintenance window will be extended until at least Monday, October 7, 2024 due to a power distribution issue in our renovated data centre.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Monday, October 7, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update III&lt;br /&gt;
| date = 2024/10/07&lt;br /&gt;
| message = Due to technical issues beyond our control the maintenance window will be extended until at least Tuesday, October 15, 2024.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Tuesday, October 15, 2024. Affected WDF GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Normal Scheduling has resumed. &lt;br /&gt;
| date = 2024/10/08&lt;br /&gt;
| message = The ARC cluster has been successfully brought online and nodes are running jobs normally. We apologize for the extended downtime. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = wdfgpu[1-12] System Update Reboots &lt;br /&gt;
| date = 2024/12/02&lt;br /&gt;
| message = wdfgpu[1-12] will be updated today for a short reboot to install important system updates and will return shortly. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
[[Category:ARC]]&lt;br /&gt;
{{Navbox ARC}}&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=Altis_Login_Node_Status&amp;diff=3650</id>
		<title>Altis Login Node Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=Altis_Login_Node_Status&amp;diff=3650"/>
		<updated>2024-12-02T20:11:18Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{ARC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/09/03&lt;br /&gt;
| message =&lt;br /&gt;
The ARC Cluster and the Altis login node is operational. No upcoming upgrades are planned.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Notice of Upcoming Partial Outage&lt;br /&gt;
| date = 2024/08/27&lt;br /&gt;
| message = Several compute nodes from the ARC cluster will be unavailable &lt;br /&gt;
between Sept 23 to Sept 27 inclusive (subject to change). Some Altis GPU nodes will be affected during this maintenance window. These nodes will return to service as soon as the work is complete.  &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update I&lt;br /&gt;
| date = 2024/09/25&lt;br /&gt;
| message = Due to hardware issues that is blocking our original maintenance window, most compute nodes that were taken offline on Monday has been brought back online today. An additional partial outage will occur again starting next Tuesday for the same nodes.&lt;br /&gt;
&lt;br /&gt;
On Tuesday, October 1, 2024, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until Friday October 4, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologise for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update II&lt;br /&gt;
| date = 2024/10/04&lt;br /&gt;
| message = The maintenance window will be extended until at least Monday, October 7, 2024 due to a power distribution issue in our renovated data centre.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Monday, October 7, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update III&lt;br /&gt;
| date = 2024/10/07&lt;br /&gt;
| message = Due to technical issues beyond our control the maintenance window will be extended until at least Tuesday, October 15, 2024.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Tuesday, October 15, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Normal Scheduling has resumed. &lt;br /&gt;
| date = 2024/10/08&lt;br /&gt;
| message = The ARC cluster has been successfully brought online and nodes are running jobs normally. We apologize for the extended downtime. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = wdfgpu[1-12] System Update Reboots &lt;br /&gt;
| date = 2024/12/02&lt;br /&gt;
| message = wdfgpu[1-12] will be updated today for a short reboot to install important system updates and will return shortly. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
[[Category:ARC]]&lt;br /&gt;
{{Navbox ARC}}&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=3595</id>
		<title>ARC Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=3595"/>
		<updated>2024-10-08T19:07:52Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{ARC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = January System Updates&lt;br /&gt;
| date = 2023/01/01&lt;br /&gt;
| message =&lt;br /&gt;
Beginning January 16, 2023, the ARC cluster will undergo operating system updates. We shall do our utmost to minimize disruption and allow ongoing jobs to be completed. New jobs may be temporarily held from scheduling.&lt;br /&gt;
&lt;br /&gt;
The ARC login node will reboot on the morning of January 16. Please save your work and log out if possible.&lt;br /&gt;
&lt;br /&gt;
The upgrade is planned to be fully complete by January 20.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = System Updates Completed&lt;br /&gt;
| date = 2023/01/24&lt;br /&gt;
| message =&lt;br /&gt;
The upgrade has been completed. The following has been changed:&lt;br /&gt;
* OS Updated to Rocky Linux 8.7&lt;br /&gt;
* Slurm updated to 22.05.7&lt;br /&gt;
* Apptainer replaces Singularity&lt;br /&gt;
* Each job will have its own /tmp, /dev/shm, /run/user/$uid mounted&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/02/28&lt;br /&gt;
| message =&lt;br /&gt;
We are currently investigating a filesystem issue that is causing filesystem slowdowns across ARC.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/03/1&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns across ARC. Some jobs on ARC have been paused to help us find the root cause of the slowdowns.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ARC Login node reboot&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
The ARC login node will be rebooted this afternoon for an emergency maintenance. This downtime is needed to help mitigate the filesystem slowdowns experienced on the login node.  Jobs will continue running and scheduling during this time.&lt;br /&gt;
&lt;br /&gt;
All logins to the ARC login node will be terminated at 3:00PM and will remain unavailable until 4:00PM.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Filesystem Issues&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns on specific nodes in our MSRDC location.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues Resolved&lt;br /&gt;
| date = 2023/03/10&lt;br /&gt;
| message =&lt;br /&gt;
We have upgraded the filesystem routers in our MSRDC location to address the performance issues.&lt;br /&gt;
&lt;br /&gt;
Please let us know if you experience any issues with the filesystem performance.&lt;br /&gt;
&lt;br /&gt;
Thank-you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/05/01&lt;br /&gt;
| message =&lt;br /&gt;
On May 1, 2023, the ARC Open OnDemand node will be rebooted between 5PM and 6PM. Expected downtime will be approximately 15 minutes.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Apptainer (Singularity) on ARC Login Node&lt;br /&gt;
| date = 2023/06/22&lt;br /&gt;
| message =&lt;br /&gt;
Apptainer (Singularity) containers may experience an error when&lt;br /&gt;
running on the Arc login node. If apptainer complains that a system&lt;br /&gt;
administrator needs to enable user namespaces, simply run your&lt;br /&gt;
containers inside a job.&lt;br /&gt;
&lt;br /&gt;
This is a temporary measure due to security vulnerability that will be&lt;br /&gt;
patched soon.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Lattice, Single, cpu2013 partition changes&lt;br /&gt;
| date = 2023/07/13&lt;br /&gt;
| message =&lt;br /&gt;
The Lattice and Single, and cpu2013 have all been decomissioned.  The Single&lt;br /&gt;
partition will be replaced by the nodes formerly in the cpu2013 partition but&lt;br /&gt;
will be called single.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/10/17&lt;br /&gt;
| message =&lt;br /&gt;
Open OnDemand will be rebooted on October 17, 2023 for an update. It will be down for up to 30 minutes.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Storage Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2023/10/23&lt;br /&gt;
| message =&lt;br /&gt;
We will be performing storage upgrades on the MARC/ARC cluster on &lt;br /&gt;
November 16 and 17, 2023. To facilitate this, we will be throttling &lt;br /&gt;
down the number of jobs on both clusters while the upgrades are &lt;br /&gt;
performed&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/05/3&lt;br /&gt;
| message =&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Power Interruption&lt;br /&gt;
| date = 2024/05/07&lt;br /&gt;
| message = Arc Experienced an brief power outage around 11AM May 7, 2024.&lt;br /&gt;
Most compute nodes have or are rebooting.  Most jobs running at this time &lt;br /&gt;
were lost. Arc administrators are actively working on restarting compute &lt;br /&gt;
nodes. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation&lt;br /&gt;
| date = 2024/06/03&lt;br /&gt;
| message = Job submissions targeted to the  GPU a100 partition will be &lt;br /&gt;
affected by a temporary reservation on the nodes to accommodate the RCS&lt;br /&gt;
summer school class taking place on 2024/Jun/10. Reservation will end &lt;br /&gt;
shortly after. Please submit your jobs normally and the scheduler will &lt;br /&gt;
start them as soon as the nodes are available. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation Removed&lt;br /&gt;
| date = 2024/06/11&lt;br /&gt;
| message = GPU a100 Nodes in ARC have been returned to normal scheduling. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Notice of Upcoming Partial Outage&lt;br /&gt;
| date = 2024/08/23&lt;br /&gt;
| message = Several compute nodes from the ARC cluster will be unavailable &lt;br /&gt;
between Sept 23 to Sept 27 inclusive (subject to change). All compute nodes &lt;br /&gt;
in cpu2019, cpu2021/2, gpu-v100 most nodes from bigmem and gpu-a100 will be &lt;br /&gt;
affected. These nodes will return to service as soon as the work is complete.  &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update I&lt;br /&gt;
| date = 2024/09/25&lt;br /&gt;
| message = Due to hardware issues that is blocking our original maintenance window, most compute nodes that were taken offline on Monday has been brought back online today. An additional partial outage will occur again starting next Tuesday for the same nodes.&lt;br /&gt;
&lt;br /&gt;
On Tuesday, October 1, 2024, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until Friday October 4, 2024.&lt;br /&gt;
&lt;br /&gt;
We apologise for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update II&lt;br /&gt;
| date = 2024/10/04&lt;br /&gt;
| message = The maintenance window will be extended until at least Monday, October 7, 2024 due to a power distribution issue in our renovated data centre.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Monday, October 7, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update III&lt;br /&gt;
| date = 2024/10/07&lt;br /&gt;
| message = Due to technical issues beyond our control the maintenance window will be extended until at least Tuesday, October 15, 2024.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Tuesday, October 15, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Normal Scheduling has resumed. &lt;br /&gt;
| date = 2024/10/08&lt;br /&gt;
| message = The ARC cluster has been successfully brought online and nodes are running jobs normally. We apologize for the extended downtime. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Navbox ARC}}&lt;br /&gt;
[[Category:ARC]]&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=Think_Login_Node_Status&amp;diff=3594</id>
		<title>Think Login Node Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=Think_Login_Node_Status&amp;diff=3594"/>
		<updated>2024-10-08T19:07:49Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{ARC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/09/03&lt;br /&gt;
| message =&lt;br /&gt;
The ARC Cluster and the Think login node is operational. No upcoming upgrades are planned.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Notice of Upcoming Partial Outage&lt;br /&gt;
| date = 2024/08/23&lt;br /&gt;
| message = Several compute nodes from the ARC cluster will be unavailable &lt;br /&gt;
between Sept 23 to Sept 27 inclusive (subject to change). Some Think GPU nodes will be affected during this maintenance window. These nodes will return to service as soon as the work is complete.  &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update I&lt;br /&gt;
| date = 2024/09/25&lt;br /&gt;
| message = Due to hardware issues that is blocking our original maintenance window, most compute nodes that were taken offline on Monday has been brought back online today. An additional partial outage will occur again starting next Tuesday for the same nodes.&lt;br /&gt;
&lt;br /&gt;
On Tuesday, October 1, 2024, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until Friday October 4, 2024.&lt;br /&gt;
&lt;br /&gt;
We apologise for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update II&lt;br /&gt;
| date = 2024/10/04&lt;br /&gt;
| message = The maintenance window will be extended until at least Monday, October 7, 2024 due to a power distribution issue in our renovated data centre.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Monday, October 7, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update III&lt;br /&gt;
| date = 2024/10/07&lt;br /&gt;
| message = Due to technical issues beyond our control the maintenance window will be extended until at least Tuesday, October 15, 2024.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Tuesday, October 15, 2024. Affected WDF GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Normal Scheduling has resumed. &lt;br /&gt;
| date = 2024/10/08&lt;br /&gt;
| message = The ARC cluster has been successfully brought online and nodes are running jobs normally. We apologize for the extended downtime. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
[[Category:ARC]]&lt;br /&gt;
{{Navbox ARC}}&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=Altis_Login_Node_Status&amp;diff=3593</id>
		<title>Altis Login Node Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=Altis_Login_Node_Status&amp;diff=3593"/>
		<updated>2024-10-08T19:07:46Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{ARC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/09/03&lt;br /&gt;
| message =&lt;br /&gt;
The ARC Cluster and the Altis login node is operational. No upcoming upgrades are planned.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Notice of Upcoming Partial Outage&lt;br /&gt;
| date = 2024/08/27&lt;br /&gt;
| message = Several compute nodes from the ARC cluster will be unavailable &lt;br /&gt;
between Sept 23 to Sept 27 inclusive (subject to change). Some Altis GPU nodes will be affected during this maintenance window. These nodes will return to service as soon as the work is complete.  &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update I&lt;br /&gt;
| date = 2024/09/25&lt;br /&gt;
| message = Due to hardware issues that is blocking our original maintenance window, most compute nodes that were taken offline on Monday has been brought back online today. An additional partial outage will occur again starting next Tuesday for the same nodes.&lt;br /&gt;
&lt;br /&gt;
On Tuesday, October 1, 2024, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until Friday October 4, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologise for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update II&lt;br /&gt;
| date = 2024/10/04&lt;br /&gt;
| message = The maintenance window will be extended until at least Monday, October 7, 2024 due to a power distribution issue in our renovated data centre.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Monday, October 7, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update III&lt;br /&gt;
| date = 2024/10/07&lt;br /&gt;
| message = Due to technical issues beyond our control the maintenance window will be extended until at least Tuesday, October 15, 2024.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Tuesday, October 15, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Normal Scheduling has resumed. &lt;br /&gt;
| date = 2024/10/08&lt;br /&gt;
| message = The ARC cluster has been successfully brought online and nodes are running jobs normally. We apologize for the extended downtime. &lt;br /&gt;
&lt;br /&gt;
Please reach out to support@hpc.ucalgary.ca with any issues or concerns. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
[[Category:ARC]]&lt;br /&gt;
{{Navbox ARC}}&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=Think_Login_Node_Status&amp;diff=3588</id>
		<title>Think Login Node Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=Think_Login_Node_Status&amp;diff=3588"/>
		<updated>2024-10-07T17:23:49Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{ARC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/09/03&lt;br /&gt;
| message =&lt;br /&gt;
The ARC Cluster and the Think login node is operational. No upcoming upgrades are planned.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Notice of Upcoming Partial Outage&lt;br /&gt;
| date = 2024/08/23&lt;br /&gt;
| message = Several compute nodes from the ARC cluster will be unavailable &lt;br /&gt;
between Sept 23 to Sept 27 inclusive (subject to change). Some Think GPU nodes will be affected during this maintenance window. These nodes will return to service as soon as the work is complete.  &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update I&lt;br /&gt;
| date = 2024/09/25&lt;br /&gt;
| message = Due to hardware issues that is blocking our original maintenance window, most compute nodes that were taken offline on Monday has been brought back online today. An additional partial outage will occur again starting next Tuesday for the same nodes.&lt;br /&gt;
&lt;br /&gt;
On Tuesday, October 1, 2024, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until Friday October 4, 2024.&lt;br /&gt;
&lt;br /&gt;
We apologise for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update II&lt;br /&gt;
| date = 2024/10/04&lt;br /&gt;
| message = The maintenance window will be extended until at least Monday, October 7, 2024 due to a power distribution issue in our renovated data centre.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Monday, October 7, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update III&lt;br /&gt;
| date = 2024/10/07&lt;br /&gt;
| message = Due to technical issues beyond our control the maintenance window will be extended until at least Tuesday, October 15, 2024.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Tuesday, October 15, 2024. Affected WDF GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:ARC]]&lt;br /&gt;
{{Navbox ARC}}&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=Altis_Login_Node_Status&amp;diff=3587</id>
		<title>Altis Login Node Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=Altis_Login_Node_Status&amp;diff=3587"/>
		<updated>2024-10-07T17:22:37Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{ARC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/09/03&lt;br /&gt;
| message =&lt;br /&gt;
The ARC Cluster and the Altis login node is operational. No upcoming upgrades are planned.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Notice of Upcoming Partial Outage&lt;br /&gt;
| date = 2024/08/27&lt;br /&gt;
| message = Several compute nodes from the ARC cluster will be unavailable &lt;br /&gt;
between Sept 23 to Sept 27 inclusive (subject to change). Some Altis GPU nodes will be affected during this maintenance window. These nodes will return to service as soon as the work is complete.  &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update I&lt;br /&gt;
| date = 2024/09/25&lt;br /&gt;
| message = Due to hardware issues that is blocking our original maintenance window, most compute nodes that were taken offline on Monday has been brought back online today. An additional partial outage will occur again starting next Tuesday for the same nodes.&lt;br /&gt;
&lt;br /&gt;
On Tuesday, October 1, 2024, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until Friday October 4, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologise for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update II&lt;br /&gt;
| date = 2024/10/04&lt;br /&gt;
| message = The maintenance window will be extended until at least Monday, October 7, 2024 due to a power distribution issue in our renovated data centre.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Monday, October 7, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update III&lt;br /&gt;
| date = 2024/10/07&lt;br /&gt;
| message = Due to technical issues beyond our control the maintenance window will be extended until at least Tuesday, October 15, 2024.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Tuesday, October 15, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:ARC]]&lt;br /&gt;
{{Navbox ARC}}&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=3586</id>
		<title>ARC Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=3586"/>
		<updated>2024-10-07T17:21:35Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{ARC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = January System Updates&lt;br /&gt;
| date = 2023/01/01&lt;br /&gt;
| message =&lt;br /&gt;
Beginning January 16, 2023, the ARC cluster will undergo operating system updates. We shall do our utmost to minimize disruption and allow ongoing jobs to be completed. New jobs may be temporarily held from scheduling.&lt;br /&gt;
&lt;br /&gt;
The ARC login node will reboot on the morning of January 16. Please save your work and log out if possible.&lt;br /&gt;
&lt;br /&gt;
The upgrade is planned to be fully complete by January 20.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = System Updates Completed&lt;br /&gt;
| date = 2023/01/24&lt;br /&gt;
| message =&lt;br /&gt;
The upgrade has been completed. The following has been changed:&lt;br /&gt;
* OS Updated to Rocky Linux 8.7&lt;br /&gt;
* Slurm updated to 22.05.7&lt;br /&gt;
* Apptainer replaces Singularity&lt;br /&gt;
* Each job will have its own /tmp, /dev/shm, /run/user/$uid mounted&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/02/28&lt;br /&gt;
| message =&lt;br /&gt;
We are currently investigating a filesystem issue that is causing filesystem slowdowns across ARC.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/03/1&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns across ARC. Some jobs on ARC have been paused to help us find the root cause of the slowdowns.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ARC Login node reboot&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
The ARC login node will be rebooted this afternoon for an emergency maintenance. This downtime is needed to help mitigate the filesystem slowdowns experienced on the login node.  Jobs will continue running and scheduling during this time.&lt;br /&gt;
&lt;br /&gt;
All logins to the ARC login node will be terminated at 3:00PM and will remain unavailable until 4:00PM.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Filesystem Issues&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns on specific nodes in our MSRDC location.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues Resolved&lt;br /&gt;
| date = 2023/03/10&lt;br /&gt;
| message =&lt;br /&gt;
We have upgraded the filesystem routers in our MSRDC location to address the performance issues.&lt;br /&gt;
&lt;br /&gt;
Please let us know if you experience any issues with the filesystem performance.&lt;br /&gt;
&lt;br /&gt;
Thank-you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/05/01&lt;br /&gt;
| message =&lt;br /&gt;
On May 1, 2023, the ARC Open OnDemand node will be rebooted between 5PM and 6PM. Expected downtime will be approximately 15 minutes.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Apptainer (Singularity) on ARC Login Node&lt;br /&gt;
| date = 2023/06/22&lt;br /&gt;
| message =&lt;br /&gt;
Apptainer (Singularity) containers may experience an error when&lt;br /&gt;
running on the Arc login node. If apptainer complains that a system&lt;br /&gt;
administrator needs to enable user namespaces, simply run your&lt;br /&gt;
containers inside a job.&lt;br /&gt;
&lt;br /&gt;
This is a temporary measure due to security vulnerability that will be&lt;br /&gt;
patched soon.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Lattice, Single, cpu2013 partition changes&lt;br /&gt;
| date = 2023/07/13&lt;br /&gt;
| message =&lt;br /&gt;
The Lattice and Single, and cpu2013 have all been decomissioned.  The Single&lt;br /&gt;
partition will be replaced by the nodes formerly in the cpu2013 partition but&lt;br /&gt;
will be called single.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/10/17&lt;br /&gt;
| message =&lt;br /&gt;
Open OnDemand will be rebooted on October 17, 2023 for an update. It will be down for up to 30 minutes.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Storage Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2023/10/23&lt;br /&gt;
| message =&lt;br /&gt;
We will be performing storage upgrades on the MARC/ARC cluster on &lt;br /&gt;
November 16 and 17, 2023. To facilitate this, we will be throttling &lt;br /&gt;
down the number of jobs on both clusters while the upgrades are &lt;br /&gt;
performed&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/05/3&lt;br /&gt;
| message =&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Power Interruption&lt;br /&gt;
| date = 2024/05/07&lt;br /&gt;
| message = Arc Experienced an brief power outage around 11AM May 7, 2024.&lt;br /&gt;
Most compute nodes have or are rebooting.  Most jobs running at this time &lt;br /&gt;
were lost. Arc administrators are actively working on restarting compute &lt;br /&gt;
nodes. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation&lt;br /&gt;
| date = 2024/06/03&lt;br /&gt;
| message = Job submissions targeted to the  GPU a100 partition will be &lt;br /&gt;
affected by a temporary reservation on the nodes to accommodate the RCS&lt;br /&gt;
summer school class taking place on 2024/Jun/10. Reservation will end &lt;br /&gt;
shortly after. Please submit your jobs normally and the scheduler will &lt;br /&gt;
start them as soon as the nodes are available. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation Removed&lt;br /&gt;
| date = 2024/06/11&lt;br /&gt;
| message = GPU a100 Nodes in ARC have been returned to normal scheduling. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Notice of Upcoming Partial Outage&lt;br /&gt;
| date = 2024/08/23&lt;br /&gt;
| message = Several compute nodes from the ARC cluster will be unavailable &lt;br /&gt;
between Sept 23 to Sept 27 inclusive (subject to change). All compute nodes &lt;br /&gt;
in cpu2019, cpu2021/2, gpu-v100 most nodes from bigmem and gpu-a100 will be &lt;br /&gt;
affected. These nodes will return to service as soon as the work is complete.  &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update I&lt;br /&gt;
| date = 2024/09/25&lt;br /&gt;
| message = Due to hardware issues that is blocking our original maintenance window, most compute nodes that were taken offline on Monday has been brought back online today. An additional partial outage will occur again starting next Tuesday for the same nodes.&lt;br /&gt;
&lt;br /&gt;
On Tuesday, October 1, 2024, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until Friday October 4, 2024.&lt;br /&gt;
&lt;br /&gt;
We apologise for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update II&lt;br /&gt;
| date = 2024/10/04&lt;br /&gt;
| message = The maintenance window will be extended until at least Monday, October 7, 2024 due to a power distribution issue in our renovated data centre.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Monday, October 7, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Partial Outage Update III&lt;br /&gt;
| date = 2024/10/07&lt;br /&gt;
| message = Due to technical issues beyond our control the maintenance window will be extended until at least Tuesday, October 15, 2024.&lt;br /&gt;
&lt;br /&gt;
Currently, the compute nodes in cpu2019, cpu2021, cpu2022, gpu-v100, gpu-a100, and most nodes from bigmem will be unavailable until at least Tuesday, October 15, 2024. Affected WDF-Altis GPU nodes include: wdfgpu[1-2,6,8-12].&lt;br /&gt;
&lt;br /&gt;
We apologize for the extended downtime and will update you as soon as we have additional information from our operations team.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Navbox ARC}}&lt;br /&gt;
[[Category:ARC]]&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=MARC_Cluster_Status&amp;diff=3551</id>
		<title>MARC Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=MARC_Cluster_Status&amp;diff=3551"/>
		<updated>2024-09-11T16:50:18Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{MARC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ January System Updates&lt;br /&gt;
| date = 2023/01/01&lt;br /&gt;
| message =&lt;br /&gt;
Beginning January 23, 2023, the MARC cluster will undergo operating system updates. We shall do our utmost to minimize disruption and allow ongoing jobs to be completed. New jobs may be temporarily held from scheduling.&lt;br /&gt;
&lt;br /&gt;
The MARC login node will reboot on the morning of January 23. Please save your work and log out if possible.&lt;br /&gt;
&lt;br /&gt;
The upgrade is planned to be fully complete by January 27.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = System Updates Completed&lt;br /&gt;
| date = 2023/01/24&lt;br /&gt;
| message =&lt;br /&gt;
The upgrade has been completed. The following has been changed:&lt;br /&gt;
* OS Updated to Rocky Linux 8.7&lt;br /&gt;
* Slurm updated to 22.05.7&lt;br /&gt;
* Apptainer replaces Singularity&lt;br /&gt;
* Each job will have its own /tmp, /dev/shm, /run/user/$uid mounted&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Apptainer (Singularity) on MARC Login Node&lt;br /&gt;
| date = 2023/06/23&lt;br /&gt;
| message =&lt;br /&gt;
Apptainer (Singularity) containers may experience an error when&lt;br /&gt;
running on the MARC login node. If apptainer complains that a system&lt;br /&gt;
administrator needs to enable user namespaces, simply run your&lt;br /&gt;
containers inside a job.&lt;br /&gt;
&lt;br /&gt;
This is a temporary measure due to security vulnerability that will be&lt;br /&gt;
patched soon.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Storage Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2023/10/23&lt;br /&gt;
| message =&lt;br /&gt;
We will be performing storage upgrades on the MARC/ARC cluster on &lt;br /&gt;
November 16 and 17, 2023. To facilitate this, we will be throttling &lt;br /&gt;
down the number of jobs on both clusters while the upgrades are &lt;br /&gt;
performed&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = OS Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2024/09/11&lt;br /&gt;
| message =&lt;br /&gt;
MARC will be going down for OS upgrades on 2024/Sep/16. The cluster &lt;br /&gt;
will be unavailable temporarily to complete this work. Please contact&lt;br /&gt;
support@hpc.ucalgary.ca if you have any questions or concerns. &lt;br /&gt;
}}&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=3529</id>
		<title>ARC Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=3529"/>
		<updated>2024-08-27T19:09:48Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{ARC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = January System Updates&lt;br /&gt;
| date = 2023/01/01&lt;br /&gt;
| message =&lt;br /&gt;
Beginning January 16, 2023, the ARC cluster will undergo operating system updates. We shall do our utmost to minimize disruption and allow ongoing jobs to be completed. New jobs may be temporarily held from scheduling.&lt;br /&gt;
&lt;br /&gt;
The ARC login node will reboot on the morning of January 16. Please save your work and log out if possible.&lt;br /&gt;
&lt;br /&gt;
The upgrade is planned to be fully complete by January 20.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = System Updates Completed&lt;br /&gt;
| date = 2023/01/24&lt;br /&gt;
| message =&lt;br /&gt;
The upgrade has been completed. The following has been changed:&lt;br /&gt;
* OS Updated to Rocky Linux 8.7&lt;br /&gt;
* Slurm updated to 22.05.7&lt;br /&gt;
* Apptainer replaces Singularity&lt;br /&gt;
* Each job will have its own /tmp, /dev/shm, /run/user/$uid mounted&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/02/28&lt;br /&gt;
| message =&lt;br /&gt;
We are currently investigating a filesystem issue that is causing filesystem slowdowns across ARC.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/03/1&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns across ARC. Some jobs on ARC have been paused to help us find the root cause of the slowdowns.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ARC Login node reboot&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
The ARC login node will be rebooted this afternoon for an emergency maintenance. This downtime is needed to help mitigate the filesystem slowdowns experienced on the login node.  Jobs will continue running and scheduling during this time.&lt;br /&gt;
&lt;br /&gt;
All logins to the ARC login node will be terminated at 3:00PM and will remain unavailable until 4:00PM.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Filesystem Issues&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns on specific nodes in our MSRDC location.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues Resolved&lt;br /&gt;
| date = 2023/03/10&lt;br /&gt;
| message =&lt;br /&gt;
We have upgraded the filesystem routers in our MSRDC location to address the performance issues.&lt;br /&gt;
&lt;br /&gt;
Please let us know if you experience any issues with the filesystem performance.&lt;br /&gt;
&lt;br /&gt;
Thank-you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/05/01&lt;br /&gt;
| message =&lt;br /&gt;
On May 1, 2023, the ARC Open OnDemand node will be rebooted between 5PM and 6PM. Expected downtime will be approximately 15 minutes.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Apptainer (Singularity) on ARC Login Node&lt;br /&gt;
| date = 2023/06/22&lt;br /&gt;
| message =&lt;br /&gt;
Apptainer (Singularity) containers may experience an error when&lt;br /&gt;
running on the Arc login node. If apptainer complains that a system&lt;br /&gt;
administrator needs to enable user namespaces, simply run your&lt;br /&gt;
containers inside a job.&lt;br /&gt;
&lt;br /&gt;
This is a temporary measure due to security vulnerability that will be&lt;br /&gt;
patched soon.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Lattice, Single, cpu2013 partition changes&lt;br /&gt;
| date = 2023/07/13&lt;br /&gt;
| message =&lt;br /&gt;
The Lattice and Single, and cpu2013 have all been decomissioned.  The Single&lt;br /&gt;
partition will be replaced by the nodes formerly in the cpu2013 partition but&lt;br /&gt;
will be called single.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/10/17&lt;br /&gt;
| message =&lt;br /&gt;
Open OnDemand will be rebooted on October 17, 2023 for an update. It will be down for up to 30 minutes.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Storage Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2023/10/23&lt;br /&gt;
| message =&lt;br /&gt;
We will be performing storage upgrades on the MARC/ARC cluster on &lt;br /&gt;
November 16 and 17, 2023. To facilitate this, we will be throttling &lt;br /&gt;
down the number of jobs on both clusters while the upgrades are &lt;br /&gt;
performed&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/05/3&lt;br /&gt;
| message =&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Power Interruption&lt;br /&gt;
| date = 2024/05/07&lt;br /&gt;
| message = Arc Experienced an brief power outage around 11AM May 7, 2024.&lt;br /&gt;
Most compute nodes have or are rebooting.  Most jobs running at this time &lt;br /&gt;
were lost. Arc administrators are actively working on restarting compute &lt;br /&gt;
nodes. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation&lt;br /&gt;
| date = 2024/06/03&lt;br /&gt;
| message = Job submissions targeted to the  GPU a100 partition will be &lt;br /&gt;
affected by a temporary reservation on the nodes to accommodate the RCS&lt;br /&gt;
summer school class taking place on 2024/Jun/10. Reservation will end &lt;br /&gt;
shortly after. Please submit your jobs normally and the scheduler will &lt;br /&gt;
start them as soon as the nodes are available. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation Removed&lt;br /&gt;
| date = 2024/06/11&lt;br /&gt;
| message = GPU a100 Nodes in ARC have been returned to normal scheduling. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Notice of Upcoming Partial Outage&lt;br /&gt;
| date = 2024/08/27&lt;br /&gt;
| message = Several compute nodes from the ARC cluster will be unavailable &lt;br /&gt;
between Sept 23 to Sept 27 inclusive (subject to change). All compute nodes &lt;br /&gt;
in cpu2019, cpu2021/2, gpu-v100 most nodes from bigmem and gpu-a100 will be &lt;br /&gt;
affected. These nodes will return to service as soon as the work is complete.  &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Navbox ARC}}&lt;br /&gt;
[[Category:ARC]]&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=3528</id>
		<title>ARC Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=3528"/>
		<updated>2024-08-27T18:20:37Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{ARC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = January System Updates&lt;br /&gt;
| date = 2023/01/01&lt;br /&gt;
| message =&lt;br /&gt;
Beginning January 16, 2023, the ARC cluster will undergo operating system updates. We shall do our utmost to minimize disruption and allow ongoing jobs to be completed. New jobs may be temporarily held from scheduling.&lt;br /&gt;
&lt;br /&gt;
The ARC login node will reboot on the morning of January 16. Please save your work and log out if possible.&lt;br /&gt;
&lt;br /&gt;
The upgrade is planned to be fully complete by January 20.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = System Updates Completed&lt;br /&gt;
| date = 2023/01/24&lt;br /&gt;
| message =&lt;br /&gt;
The upgrade has been completed. The following has been changed:&lt;br /&gt;
* OS Updated to Rocky Linux 8.7&lt;br /&gt;
* Slurm updated to 22.05.7&lt;br /&gt;
* Apptainer replaces Singularity&lt;br /&gt;
* Each job will have its own /tmp, /dev/shm, /run/user/$uid mounted&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/02/28&lt;br /&gt;
| message =&lt;br /&gt;
We are currently investigating a filesystem issue that is causing filesystem slowdowns across ARC.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/03/1&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns across ARC. Some jobs on ARC have been paused to help us find the root cause of the slowdowns.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ARC Login node reboot&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
The ARC login node will be rebooted this afternoon for an emergency maintenance. This downtime is needed to help mitigate the filesystem slowdowns experienced on the login node.  Jobs will continue running and scheduling during this time.&lt;br /&gt;
&lt;br /&gt;
All logins to the ARC login node will be terminated at 3:00PM and will remain unavailable until 4:00PM.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Filesystem Issues&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns on specific nodes in our MSRDC location.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues Resolved&lt;br /&gt;
| date = 2023/03/10&lt;br /&gt;
| message =&lt;br /&gt;
We have upgraded the filesystem routers in our MSRDC location to address the performance issues.&lt;br /&gt;
&lt;br /&gt;
Please let us know if you experience any issues with the filesystem performance.&lt;br /&gt;
&lt;br /&gt;
Thank-you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/05/01&lt;br /&gt;
| message =&lt;br /&gt;
On May 1, 2023, the ARC Open OnDemand node will be rebooted between 5PM and 6PM. Expected downtime will be approximately 15 minutes.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Apptainer (Singularity) on ARC Login Node&lt;br /&gt;
| date = 2023/06/22&lt;br /&gt;
| message =&lt;br /&gt;
Apptainer (Singularity) containers may experience an error when&lt;br /&gt;
running on the Arc login node. If apptainer complains that a system&lt;br /&gt;
administrator needs to enable user namespaces, simply run your&lt;br /&gt;
containers inside a job.&lt;br /&gt;
&lt;br /&gt;
This is a temporary measure due to security vulnerability that will be&lt;br /&gt;
patched soon.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Lattice, Single, cpu2013 partition changes&lt;br /&gt;
| date = 2023/07/13&lt;br /&gt;
| message =&lt;br /&gt;
The Lattice and Single, and cpu2013 have all been decomissioned.  The Single&lt;br /&gt;
partition will be replaced by the nodes formerly in the cpu2013 partition but&lt;br /&gt;
will be called single.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/10/17&lt;br /&gt;
| message =&lt;br /&gt;
Open OnDemand will be rebooted on October 17, 2023 for an update. It will be down for up to 30 minutes.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Storage Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2023/10/23&lt;br /&gt;
| message =&lt;br /&gt;
We will be performing storage upgrades on the MARC/ARC cluster on &lt;br /&gt;
November 16 and 17, 2023. To facilitate this, we will be throttling &lt;br /&gt;
down the number of jobs on both clusters while the upgrades are &lt;br /&gt;
performed&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/05/3&lt;br /&gt;
| message =&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Power Interruption&lt;br /&gt;
| date = 2024/05/07&lt;br /&gt;
| message = Arc Experienced an brief power outage around 11AM May 7, 2024.&lt;br /&gt;
Most compute nodes have or are rebooting.  Most jobs running at this time &lt;br /&gt;
were lost. Arc administrators are actively working on restarting compute &lt;br /&gt;
nodes. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation&lt;br /&gt;
| date = 2024/06/03&lt;br /&gt;
| message = Job submissions targeted to the  GPU a100 partition will be &lt;br /&gt;
affected by a temporary reservation on the nodes to accommodate the RCS&lt;br /&gt;
summer school class taking place on 2024/Jun/10. Reservation will end &lt;br /&gt;
shortly after. Please submit your jobs normally and the scheduler will &lt;br /&gt;
start them as soon as the nodes are available. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation Removed&lt;br /&gt;
| date = 2024/06/11&lt;br /&gt;
| message = GPU a100 Nodes in ARC have been returned to normal scheduling. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Notice of Upcoming Partial Outage&lt;br /&gt;
| date = 2024/08/27&lt;br /&gt;
| message = Several compute nodes from the ARC cluster will be unavailable &lt;br /&gt;
between Sept 23 to Sept 27 inclusive (subject to change). All compute nodes &lt;br /&gt;
in cpu2019, cpu2021, gpu-v100 most nodes from bigmem and gpu-a100 will be &lt;br /&gt;
affected. These nodes will return to service as soon as the work is complete.  &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Navbox ARC}}&lt;br /&gt;
[[Category:ARC]]&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Guide&amp;diff=3519</id>
		<title>ARC Cluster Guide</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Guide&amp;diff=3519"/>
		<updated>2024-08-19T21:31:00Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{ARC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
{{Message Box&lt;br /&gt;
|title=[[Support|Need Help or have other ARC Related Questions?]]&lt;br /&gt;
|message=For all general RCS related issues, questions, or comments, please contact us at support@hpc.ucalgary.ca.&lt;br /&gt;
|icon=Support Icon.png}}&lt;br /&gt;
&lt;br /&gt;
This guide gives an overview of the Advanced Research Computing (ARC) cluster at the University of Calgary and is intended to be read by new account holders getting started on ARC. This guide covers topics such as the hardware and performance characteristics, available software, usage policies and how to log in and run jobs. ARC can be used with data that a Researcher has classified as Lv1 and Lv2 as described in the UCalgary [https://www.ucalgary.ca/legal-services/sites/default/files/teams/1/Standards-Legal-Information-Security-Classification-Standard.pdf Information Security Classification Standard] &lt;br /&gt;
&lt;br /&gt;
== Introduction ==&lt;br /&gt;
The ARC is a high performance compute (HPC) cluster that is available for research projects based at the University of Calgary. This compute cluster is comprised of hundreds of severs interconnected with a high bandwidth interconnect. Special resources within the cluster include nodes with large memory installed and GPUs are also available. You may learn more about ARC&#039;s hardware in the [[ARC Cluster Guide#Hardware|hardware section below]]. ARC can be accessed through a [[Linux Introduction|command line interface]] or via a web interface called Open OnDemand.&lt;br /&gt;
&lt;br /&gt;
This cluster can be used for running large numbers (hundreds) of concurrent serial (one core) jobs, OpenMP or other thread-based jobs, shared-memory parallel code using up to 40 or 80 threads per job (depending on the partition), distributed-memory (MPI-based) parallel code using up to hundreds of cores, or jobs that take advantage of Graphics Processing Units (GPUs).&lt;br /&gt;
&lt;br /&gt;
Historically, ARC is primarily comprised of older, disparate Linux-based clusters that were formerly offered to researchers from across Canada such as Breezy, Lattice, and Parallel.  In addition, a large-memory compute node (Bigbyte) was salvaged from the now-retired local Storm cluster. In January 2019, a major addition to ARC with modern hardware was purchased. In 2020, compute clusters from CHGI have been migrated into ARC.&lt;br /&gt;
&lt;br /&gt;
=== How to Get Started ===&lt;br /&gt;
If you have a project you think would be appropriate for ARC, please email support@hpc.ucalgary.ca and mention the intended research and software you plan to use. You must have a University of Calgary IT account in order to use ARC.&lt;br /&gt;
* For users that do not have a University of IT account or email address, please register for one at https://itregport.ucalgary.ca/.&lt;br /&gt;
* For users external to the University, such as for users collaborating on a research project at the University of Calgary, please contact us and mention the project leader you are collaborating with.&lt;br /&gt;
&lt;br /&gt;
Once your access to ARC has been granted, you will be able to immediately make use of the cluster using your University of Calgary IT account by following the [[ARC_Cluster_Guide#Using_ARC|usage guide outlined below]].&lt;br /&gt;
&lt;br /&gt;
== Using ARC ==&lt;br /&gt;
&lt;br /&gt;
{{Message Box&lt;br /&gt;
|icon=Security Icon.png&lt;br /&gt;
|title=Cybersecurity awareness at the U of C&lt;br /&gt;
|message=Please note that there are typically about 950 phishing attempts targeting University of Calgary accounts each month. This is just a reminder to be careful about computer security issues, both at home and at the University. Please visit https://it.ucalgary.ca/it-security for more information, tips on secure computing, and how to report suspected security problems.}}&lt;br /&gt;
&lt;br /&gt;
=== Logging in ===&lt;br /&gt;
To log in to ARC, connect using SSH to &amp;lt;code&amp;gt;arc.ucalgary.ca&amp;lt;/code&amp;gt; on port &amp;lt;code&amp;gt;22&amp;lt;/code&amp;gt;. Connections to ARC are accepted only from the University of Calgary network (on campus) or through the University of Calgary General VPN (off campus).&lt;br /&gt;
&lt;br /&gt;
See [[Connecting to RCS HPC Systems]] for more information.&lt;br /&gt;
=== How to interact with ARC ===&lt;br /&gt;
&lt;br /&gt;
ARC cluster is a collection of several compute nodes connected by a high-speed network. On ARC, computations get submitted as jobs. Once submitted, the jobs are then assigned to compute nodes by the job scheduler as resources become available.&lt;br /&gt;
&lt;br /&gt;
[[File:Cluster.png]]&lt;br /&gt;
&lt;br /&gt;
You can access ARC with your UCalgary IT user credentials. Once connected, you will get placed in the ARC login node, for basic tasks such as job submission, monitor job status, manage files, edit text, etc. It is a shared resource where multiple users get connected at the same time. Thus, any intensive tasks is not allowed on the login node as it may block other potential users to connect/submit their computations. &lt;br /&gt;
         [tannistha.nandi@arc ~]$ &lt;br /&gt;
The job scheduling system on ARC is called SLURM.  On ARC, there are two SLURM commands that can allocate resources to a job under appropriate conditions: ‘salloc’ and ‘sbatch’. They both accept the same set of command line options with respect to resource allocation. &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;‘salloc’&#039;&#039;&#039; is to launch an interactive session, typically for tasks under 5 hours. &lt;br /&gt;
Once an interactive job session is created, you can do things like explore research datasets, start R or python sessions to test your code, compile software applications etc.&lt;br /&gt;
&lt;br /&gt;
a. Example 1: The following command requests for 1 cpu on 1 node for 1 task along with 1 GB of RAM for an hour. &lt;br /&gt;
          [tannistha.nandi@arc ~]$ salloc --mem=1G -c 1 -N 1 -n 1  -t 01:00:00&lt;br /&gt;
          salloc: Granted job allocation 6758015&lt;br /&gt;
          salloc: Waiting for resource configuration&lt;br /&gt;
          salloc: Nodes fc4 are ready for job&lt;br /&gt;
          [tannistha.nandi@fc4 ~]$ &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
b. Example 2:  The following command requests for 1 GPU to be used from 1 node belonging to the gpu-v100 partition along with 1 GB of RAM for 1 hour.  Generic resource scheduling (--gres) is used to request for GPU resources.&lt;br /&gt;
         [tannistha.nandi@arc ~]$ salloc --mem=1G -t 01:00:00 -p gpu-v100 --gres=gpu:1&lt;br /&gt;
         salloc: Granted job allocation 6760460&lt;br /&gt;
         salloc: Waiting for resource configuration&lt;br /&gt;
         salloc: Nodes fg3 are ready for job&lt;br /&gt;
         [tannistha.nandi@fg3 ~]$&lt;br /&gt;
&lt;br /&gt;
Once you finish the work, type &#039;exit&#039; at the command prompt to end the interactive session,&lt;br /&gt;
         [tannistha.nandi@fg3 ~]$ exit&lt;br /&gt;
         [tannistha.nandi@fg3 ~]$ salloc: Relinquishing job allocation 6760460&lt;br /&gt;
It is to ensure that the allocated resources are released from your job and now available to other users.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;‘sbatch’&#039;&#039;&#039; is to submit computations as jobs to run on the cluster. You can submit a job-script.slurm via &#039;sbatch&#039; for execution.   &lt;br /&gt;
         [tannistha.nandi@arc ~]$ sbatch job-script.slurm&lt;br /&gt;
When resources become available, they get allocated to this task. Batch jobs are suited for tasks that run for long periods of time without any user supervision. When the job-script terminates, the allocation is released. &lt;br /&gt;
Please review the section on how to prepare job scripts for more information.&lt;br /&gt;
&lt;br /&gt;
=== Prepare job scripts  ===&lt;br /&gt;
Job scripts are text files saved with an extension &#039;.slurm&#039;, for example, &#039;job-script.slurm&#039;. &lt;br /&gt;
A job script looks something like this:&lt;br /&gt;
    &#039;&#039;#!/bin/bash&#039;&#039;&lt;br /&gt;
    ####### Reserve computing resources #############&lt;br /&gt;
    #SBATCH --nodes=1&lt;br /&gt;
    #SBATCH --ntasks=1&lt;br /&gt;
    #SBATCH --cpus-per-task=1&lt;br /&gt;
    #SBATCH --time=01:00:00&lt;br /&gt;
    #SBATCH --mem=1G&lt;br /&gt;
    #SBATCH --partition=cpu2019&amp;lt;br&amp;gt;&lt;br /&gt;
    ####### Set environment variables ###############&lt;br /&gt;
    module load python/anaconda3-2018.12&amp;lt;br&amp;gt;&lt;br /&gt;
    ####### Run your script #########################&lt;br /&gt;
    python myscript.py&lt;br /&gt;
&lt;br /&gt;
The first line contains the text &amp;quot;#!/bin/bash&amp;quot; to interpret it as a bash script.&lt;br /&gt;
&lt;br /&gt;
It is followed by lines that start with a &#039;#SBATCH&#039; to communicate with  &#039;SLURM&#039;. You may add as many #SBATCH directives as needed to reserve computing resources for your task. The above example requests for one cpu on a single node for 1 task along with 1GB RAM for an hour on cpu2019 partition.&lt;br /&gt;
&lt;br /&gt;
Next, you have to set up environment variables either by loading the modules centrally installed on ARC or export path to the software in your home directory. The above example loads an available python module.&lt;br /&gt;
&lt;br /&gt;
Finally, include the Linux command to execute the local script.&lt;br /&gt;
&lt;br /&gt;
Note that failing to specify part of a resource allocation request (most notably &#039;&#039;&#039;time&#039;&#039;&#039; and &#039;&#039;&#039;memory&#039;&#039;&#039;) will result in bad resource requests as the defaults are not appropriate to most cases. Please refer to the section &#039;Running non-interactive jobs&#039; for more examples.&lt;br /&gt;
&lt;br /&gt;
== Hardware ==&lt;br /&gt;
Since the ARC cluster is a conglomeration of many different compute clusters, the hardware within ARC can vary widely in terms of performance and capabilities.  To mitigate any compatibility issues with different hardware, we combine similar hardware into their own Slurm partition to ensure your workload runs as consistently as possible within one partition. Please carefully review the hardware specs for each of the partitions below to avoid any surprises.&lt;br /&gt;
&lt;br /&gt;
=== Partition Hardware Specs ===&lt;br /&gt;
When submitting jobs to ARC, you may specify a partition that your job will run on.  Please choose a partition that is most appropriate for your work.&lt;br /&gt;
&lt;br /&gt;
* See also [[How to find available partitions on ARC]].&lt;br /&gt;
&lt;br /&gt;
A few things to keep in mind when choosing a partition:&lt;br /&gt;
* Specific workloads requiring special Intel Instruction Set Extensions may only work on newer Intel CPUs. &lt;br /&gt;
* If working with multi-node parallel processing, ensure your software and libraries support the partition&#039;s interconnect networking.&lt;br /&gt;
* While older partitions may be slower, they may be less busy and have little to no wait times.&lt;br /&gt;
&lt;br /&gt;
If you are unsure which partition to use or need assistance on selecting an appropriate partition, please see [[#Selecting_a_Partition|the Selecting a Partition Section]] below. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Partition&lt;br /&gt;
! Description&lt;br /&gt;
! Nodes&lt;br /&gt;
! CPU Cores, Model, and Year&lt;br /&gt;
! Memory&lt;br /&gt;
! GPU&lt;br /&gt;
! Network&lt;br /&gt;
|-&lt;br /&gt;
| -&lt;br /&gt;
| ARC Login Node&lt;br /&gt;
| 1&lt;br /&gt;
| 16 cores, 2x Intel(R) Xeon(R) CPU E5620  @ 2.40GHz (Westmere, 2010)&lt;br /&gt;
| 48 GB&lt;br /&gt;
| N/A&lt;br /&gt;
| 40 Gbit/s InfiniBand&lt;br /&gt;
|-&lt;br /&gt;
| gpu-v100&lt;br /&gt;
| GPU Parition&lt;br /&gt;
| 13&lt;br /&gt;
| 80 cores, 2x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (Skylake, 2019)&lt;br /&gt;
| 754 GB&lt;br /&gt;
| 2x Tesla V100-PCIE-16GB&lt;br /&gt;
| 100 Gbit/s Omni-Path&lt;br /&gt;
|-&lt;br /&gt;
|gpu-a100&lt;br /&gt;
|GPU Partition&lt;br /&gt;
|5&lt;br /&gt;
|40 cores, 1x Intel(R) Xeon(R) Silver 4316 CPU @ 2.30GHz (Ice Lake, 2021)&lt;br /&gt;
|512 GB&lt;br /&gt;
|2x GA100 A100 PCIe 80GB&lt;br /&gt;
|100 Gbit/s Mellanox Infiniband&lt;br /&gt;
|-&lt;br /&gt;
|cpu2023&lt;br /&gt;
|General Purpose Compute&lt;br /&gt;
|48&lt;br /&gt;
|64 cores, 2x Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake, 2021)&lt;br /&gt;
|512 GB&lt;br /&gt;
|N/A&lt;br /&gt;
|40 Gbit/s Mellanox Infiniband (temporarily)&lt;br /&gt;
|-&lt;br /&gt;
|cpu2022&lt;br /&gt;
|General Purpose Compute&lt;br /&gt;
|52&lt;br /&gt;
|52 cores, 2x Intel(R) Xeon(R) Gold 5320 CPU @ 2.20GHz (Ice Lake)&lt;br /&gt;
|256 GB&lt;br /&gt;
|N/A&lt;br /&gt;
|40 Gbit/s InfiniBand&lt;br /&gt;
|-&lt;br /&gt;
| cpu2021&lt;br /&gt;
| General Purpose Compute&lt;br /&gt;
| 48&lt;br /&gt;
| 48 cores, 2x Intel(R) Xeon(R) Gold 6240R CPU @ 2.40GHz (Cascade Lake, 2021)&lt;br /&gt;
| 185 GB&lt;br /&gt;
| N/A &lt;br /&gt;
| 100 Gbit/s Mellanox Infiniband&lt;br /&gt;
|-&lt;br /&gt;
| cpu2019&lt;br /&gt;
| General Purpose Compute&lt;br /&gt;
| 14&lt;br /&gt;
| 40 cores, 2x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (Skylake, 2019)&lt;br /&gt;
| 190 GB&lt;br /&gt;
| N/A &lt;br /&gt;
| 100 Gbit/s Omni-Path&lt;br /&gt;
|-&lt;br /&gt;
| apophis&lt;br /&gt;
| General Purpose Compute&lt;br /&gt;
| 21&lt;br /&gt;
| 40 cores, 2x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (Skylake, 2019)&lt;br /&gt;
| 190 GB&lt;br /&gt;
| N/A &lt;br /&gt;
| 100 Gbit/s Omni-Path&lt;br /&gt;
|-&lt;br /&gt;
| razi&lt;br /&gt;
| General Purpose Compute&lt;br /&gt;
| 41&lt;br /&gt;
| 40 cores, 2x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (Skylake, 2019)&lt;br /&gt;
| 190 GB&lt;br /&gt;
| N/A &lt;br /&gt;
| 100 Gbit/s Omni-Path&lt;br /&gt;
|-&lt;br /&gt;
| bigmem&lt;br /&gt;
| Big Memory Nodes&lt;br /&gt;
| 2&lt;br /&gt;
| 80 cores, 4x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (Skylake, 2019)&lt;br /&gt;
| 3022 GB&lt;br /&gt;
| N/A &lt;br /&gt;
| 100 Gbit/s Omni-Path&lt;br /&gt;
|-&lt;br /&gt;
| pawson&lt;br /&gt;
| General Purpose Compute&lt;br /&gt;
| 13&lt;br /&gt;
| 40 cores, 2x Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz (Skylake, 2019)&lt;br /&gt;
| 190 GB&lt;br /&gt;
| N/A&lt;br /&gt;
| 100 Gbit/s Omni-Path&lt;br /&gt;
|-&lt;br /&gt;
|cpu2017&lt;br /&gt;
|General Purpose Compute&lt;br /&gt;
|14&lt;br /&gt;
|56 cores, 2x Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz (Sandy Bridge, 2012)&lt;br /&gt;
|256 GB&lt;br /&gt;
|N/A&lt;br /&gt;
|40 Gbit/s InfiniBand&lt;br /&gt;
|-&lt;br /&gt;
| theia&lt;br /&gt;
| Former Theia cluster&lt;br /&gt;
| 20&lt;br /&gt;
| 56 cores, 2x Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz (Sandy Bridge, 2012)&lt;br /&gt;
| 188 GB&lt;br /&gt;
| N/A &lt;br /&gt;
| 40 Gbit/s InfiniBand&lt;br /&gt;
|-&lt;br /&gt;
| cpu2013&lt;br /&gt;
| Former hyperion cluster&lt;br /&gt;
| 12&lt;br /&gt;
| 32 cores, 2x Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz (Sandy Bridge, 2012)&lt;br /&gt;
| 126 GB&lt;br /&gt;
| N/A&lt;br /&gt;
| 40 Gbit/s InfiniBand&lt;br /&gt;
|-&lt;br /&gt;
| lattice&lt;br /&gt;
| Former Lattice cluster&lt;br /&gt;
| 307&lt;br /&gt;
| 8 cores, 2x Intel(R) Xeon(R) CPU L5520  @ 2.27GHz (Nehalem, 2009)&lt;br /&gt;
| 12 GB&lt;br /&gt;
| N/A&lt;br /&gt;
| 40 Gbit/s InfiniBand&lt;br /&gt;
|-&lt;br /&gt;
| single&lt;br /&gt;
| Former Lattice cluster&lt;br /&gt;
| 168&lt;br /&gt;
| 8 cores, 2x Intel(R) Xeon(R) CPU L5520  @ 2.27GHz (Nehalem, 2009)&lt;br /&gt;
| 12 GB&lt;br /&gt;
| N/A&lt;br /&gt;
| 40 Gbit/s InfiniBand&lt;br /&gt;
|-&lt;br /&gt;
| parallel&lt;br /&gt;
| Former Parallel Cluster&lt;br /&gt;
| 576&lt;br /&gt;
| 12 cores, 2x Intel(R) Xeon(R) CPU E5649  @ 2.53GHz (Westmere, 2011)&lt;br /&gt;
| 24 GB&lt;br /&gt;
| N/A&lt;br /&gt;
| 40 Gbit/s InfiniBand&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===ARC Cluster Storage===&lt;br /&gt;
Usage of ARC cluster storage is outlined by our [[ARC Storage Terms of Use]] page.&lt;br /&gt;
&lt;br /&gt;
{{Warning Box&lt;br /&gt;
| title=Data Storage&lt;br /&gt;
| message=ARC storage is not suitable for long-term or archival storage.  It is not backed-up and does not have sufficient redundancy to be used as a primary storage system.  It is not guaranteed to be available for the time periods that are typical of archiving.&lt;br /&gt;
&lt;br /&gt;
Please ensure that the only data you keep on ARC is used for active computations.&lt;br /&gt;
&lt;br /&gt;
For information on available campus storage options, please see [[Storage Options]].&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message Box&lt;br /&gt;
| title=No Backup Policy!&lt;br /&gt;
| message=You are responsible for your own backups.  Many researchers will have accounts with Compute Canada and may choose to back up their data there (the Project file system accessible through the Cedar cluster would often be used). &lt;br /&gt;
&lt;br /&gt;
Please contact us at support@hpc.ucalgary.ca if you want more information about this option.&lt;br /&gt;
&lt;br /&gt;
You can also back up data to your UofC OneDrive for business allocation see: https://rcs.ucalgary.ca/How_to_transfer_data#rclone:_rsync_for_cloud_storage This allocation starts at 5TB. Contact the support center for questions regarding OneDrive for Business.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
The ARC cluster has around 2 petabyte of shared disk storage available across the entire cluster as well as temporary storage local to each of the compute nodes. Please refer to the individual sections below on the capacity limitations and usage policies. &lt;br /&gt;
&lt;br /&gt;
Use the &amp;lt;code&amp;gt;arc.quota&amp;lt;/code&amp;gt; command on ARC to determine the available space on your various volumes and home directory.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Partition&lt;br /&gt;
!Description&lt;br /&gt;
!Capacity&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;/home&amp;lt;/code&amp;gt;&lt;br /&gt;
|User home directories&lt;br /&gt;
|500 GB (per user)&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;/work&amp;lt;/code&amp;gt;&lt;br /&gt;
|Research project storage&lt;br /&gt;
|Up to 100&#039;s of TB&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt;&lt;br /&gt;
|Scratch space for temporary files&lt;br /&gt;
|Up to 15 TB&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;/tmp&amp;lt;/code&amp;gt;&lt;br /&gt;
|Temporary space local to the compute cluster&lt;br /&gt;
|Dependent on available storage on nodes. Verify with &amp;lt;code&amp;gt;df -h&amp;lt;/code&amp;gt;.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;/dev/shm&amp;lt;/code&amp;gt;&lt;br /&gt;
|Small temporary in-memory disk space local to the compute cluster&lt;br /&gt;
|Dependent on memory size set in your Slurm job.&lt;br /&gt;
|}&lt;br /&gt;
====&amp;lt;code&amp;gt;/home&amp;lt;/code&amp;gt;: Home file system====&lt;br /&gt;
Each user has a directory under /home and is the default working directory when logging in to ARC. Each home directory has a per-user quota of 500 GB. This limit is fixed and cannot be increased. Researchers requiring additional storage exceeding what is available on their home directory may use &amp;lt;code&amp;gt;/work&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Note on file sharing: Due to security concerns, permissions set using &amp;lt;code&amp;gt;chmod&amp;lt;/code&amp;gt; on your home directory to allow other users to read/write to your home directory be automatically reverted by an automated system process unless an explicit exception is made.  If you need to share files with other researchers on the ARC cluster, please write to support@hpc.ucalgary.ca to ask for such an exception.&lt;br /&gt;
&lt;br /&gt;
====&amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt;: Scratch file system for large job-oriented storage====&lt;br /&gt;
Associated with each job, under the &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; directory, a subdirectory is created that can be referenced in job scripts as &amp;lt;code&amp;gt;/scratch/${SLURM_JOB_ID}&amp;lt;/code&amp;gt;. You can use that directory for temporary files needed during the course of a job. Up to 15 TB of storage may be used, per user (total for all your jobs) in the &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; file system. &lt;br /&gt;
&lt;br /&gt;
Data in &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; associated with a given job will be deleted automatically, without exception, five days after the job finishes.&lt;br /&gt;
&lt;br /&gt;
====&amp;lt;code&amp;gt;/work&amp;lt;/code&amp;gt;: Work file system for larger projects====&lt;br /&gt;
If you need more space than provided in &amp;lt;code&amp;gt;/home&amp;lt;/code&amp;gt; and the &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; job-oriented space is not appropriate for you case, please write to support@hpc.ucalgary.ca with an explanation, including an indication of how much storage you expect to need and for how long.  If approved, you will then be assigned a directory under &amp;lt;code&amp;gt;/work&amp;lt;/code&amp;gt; with an appropriately large quota.&lt;br /&gt;
&lt;br /&gt;
====&amp;lt;code&amp;gt;/tmp&amp;lt;/code&amp;gt;,&amp;lt;code&amp;gt;/var/tmp&amp;lt;/code&amp;gt;: Temporary files====&lt;br /&gt;
You may use &amp;lt;code&amp;gt;/tmp&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;/var/tmp&amp;lt;/code&amp;gt; for storing temporary files generated by your job. The &amp;lt;code&amp;gt;/tmp&amp;lt;/code&amp;gt; is stored on a disk local to the compute node and is not shared across the cluster. The files stored here will be removed immediately after your job terminates.&lt;br /&gt;
&lt;br /&gt;
==== &amp;lt;code&amp;gt;/dev/shm&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;/run/user/$uid&amp;lt;/code&amp;gt;: In-memory temporary files ====&lt;br /&gt;
&amp;lt;code&amp;gt;/dev/shm&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;/run/user/$UID&amp;lt;/code&amp;gt; is writable location for temporary files backed by virtual memory. This can be used if faster I/O is required. This is ideal for workloads that require many small read/writes to share data between processes or as a fast cache. The amount of data you can write here is dependent on the amount of free memory available to your job. The files stored at these locations will be removed immediately after your job terminates.&lt;br /&gt;
&lt;br /&gt;
== Software ==&lt;br /&gt;
All ARC nodes run the latest version of Rocky Linux 8 with the same set of base software packages. To maintain the stability and consistency of all nodes, any additional dependencies that your software requires must be installed under your account.  For your convenience, we have packaged commonly used software packages and dependencies as modules available under &amp;lt;code&amp;gt;/global/software&amp;lt;/code&amp;gt;. If your software package is not available as a module, you may also try Anaconda which allows users to manage and install custom packages in an isolated environment.&lt;br /&gt;
&lt;br /&gt;
For a list of available packages that have been made available, please see [[ARC Software pages]]. &lt;br /&gt;
&lt;br /&gt;
Please contact us at support@hpc.ucalgary.ca if you need additional software installed.&lt;br /&gt;
&lt;br /&gt;
==== Modules ====&lt;br /&gt;
The setup of the environment for using some of the installed software is through the &amp;lt;code&amp;gt;module&amp;lt;/code&amp;gt; command. An overview of [https://www.westgrid.ca//support/modules modules on WestGrid (external link)] is largely applicable to ARC.&lt;br /&gt;
&lt;br /&gt;
Software packages bundled as a module will be available under &amp;lt;code&amp;gt;/global/software&amp;lt;/code&amp;gt; and can be listed with the &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; command.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
$ module avail&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To enable Python, load the Python module by running:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
$ module load python/anaconda-3.6-5.1.0&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To unload the Python module, run:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
$ module remove python/anaconda-3.6-5.1.0&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To see currently loaded modules, run:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
$ module list&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By default, no modules are loaded on ARC. If you wish to use a specific module, such as the Intel compilers or the Open MPI parallel programming packages, you must load the appropriate module.&lt;br /&gt;
&lt;br /&gt;
== Job submission ==&lt;br /&gt;
&lt;br /&gt;
=== Interactive Jobs ===&lt;br /&gt;
The ARC login node may be used for such tasks as editing files, compiling programs and running short tests while developing programs. We suggest CPU intensive workloads on the login node be restricted to under 15 minutes as per [[General Cluster Guidelines and Policies|our cluster guidelines]]. For interactive workloads exceeding 15 minutes, use the &#039;&#039;&#039;[[Running_jobs#Interactive_jobs|salloc command]]&#039;&#039;&#039; to allocate an interactive session on a compute node.&lt;br /&gt;
&lt;br /&gt;
The default salloc allocation is 1 CPU and 1 GB of memory. Adjust this by specifying &amp;lt;code&amp;gt;-n CPU#&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;--mem Megabytes&amp;lt;/code&amp;gt;. You may request up to 5 hours of CPU time for interactive jobs.&lt;br /&gt;
 salloc --time=5:00:00 --partition=cpu2019&lt;br /&gt;
&lt;br /&gt;
Always use salloc or srun to start an interactive job. Do not SSH directly to a compute node as SSH sessions will be refused without an active job running.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- This information doesn&#039;t seem that useful or relevant to running interactive jobs. Move to getting started section?&lt;br /&gt;
ARC uses the Linux operating system. The program that responds to your typed commands and allows you to run other programs is called the Linux shell. There are several different shells available, but, by default you will use one called bash. It is useful to have some knowledge of the shell and a variety of other command-line programs that you can use to manipulate files. If you are new to Linux systems, we recommend that you work through one of the many online tutorials that are available, such as the [http://www.ee.surrey.ac.uk/Teaching/Unix/index.html UNIX Tutorial for Beginners (external link)] provided by the University of Surrey. The tutorial covers such fundamental topics, among others, as creating, renaming and deleting files and directories, how to produce a listing of your files and how to tell how much disk space you are using.  For a more comprehensive introduction to Linux, see [http://linuxcommand.sourceforge.net/tlcl.php The Linux Command Line (external link)].&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Running non-interactive jobs (batch processing) ===&lt;br /&gt;
Production runs and longer test runs should be submitted as (non-interactive) batch jobs, in which commands to be executed are listed in a script (text file). Batch jobs scripts are submitted using the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command, part of the Slurm job management and scheduling software. #SBATCH directive lines at the beginning of the script are used to specify the resources needed for the job (cores, memory, run time limit and any specialized hardware needed).&lt;br /&gt;
&lt;br /&gt;
Most of the information on the [https://docs.computecanada.ca/wiki/Running_jobs Running Jobs (external link)] page on the Compute Canada web site is also relevant for submitting and managing batch jobs and reserving processors for interactive work on ARC.  One major difference between running jobs on the ARC and Compute Canada clusters is in selecting the type of hardware that should be used for a job. On ARC, you choose the hardware to use primarily by specifying a partition, as described below.&lt;br /&gt;
&lt;br /&gt;
=== Selecting a Partition ===&lt;br /&gt;
There are some aspects to consider when selecting a partition including:&lt;br /&gt;
* Resource requirements in terms of memory and CPU cores&lt;br /&gt;
* Hardware specific requirements, such as GPU or CPU Instruction Set Extensions&lt;br /&gt;
* Partition resource limits and potential wait time&lt;br /&gt;
* Software support parallel processing using Message Passing Interface (MPI), OpenMP, etc.&lt;br /&gt;
** Eg. MPI for parallel processing can distribute memory across multiple nodes, per-node memory requirements could be lower. Whereas, OpenMP or single process code that is restricted to one node would require a higher memory node.&lt;br /&gt;
** Note: MPI code running on hardware with Omni-Path networking should be compiled with Omni-Path networking support. This is provided by loading the &amp;lt;code&amp;gt;openmpi/2.1.3-opa&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;openmpi/3.1.2-opa&amp;lt;/code&amp;gt; modules prior to compiling.&lt;br /&gt;
&lt;br /&gt;
Since resources that are requested are reserved for your job, please request only as much CPU and memory as your job requires to avoid reducing the cluster efficiency.  If you are unsure which partition to use or the specific resource requests that are appropriate for your jobs, please contact us at [mailto:support@hpc.ucalgary.ca support@hpc.ucalgary.ca] and we would be happy to work with you.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;width: 100%;&amp;quot;&lt;br /&gt;
!Partition&lt;br /&gt;
!Description&lt;br /&gt;
!Cores/node&lt;br /&gt;
!Memory Request Limit&lt;br /&gt;
!Time Limit&lt;br /&gt;
!GPU&lt;br /&gt;
!Networking&lt;br /&gt;
|-&lt;br /&gt;
|cpu2021&lt;br /&gt;
|General Purpose Compute&lt;br /&gt;
|48&lt;br /&gt;
|185,000 MB&lt;br /&gt;
|7 days ‡&lt;br /&gt;
|&lt;br /&gt;
|100 Gbit/s Omni-Path&lt;br /&gt;
|-&lt;br /&gt;
|cpu2019&lt;br /&gt;
|General Purpose Compute&lt;br /&gt;
|40&lt;br /&gt;
|185,000 MB&lt;br /&gt;
|7 days ‡&lt;br /&gt;
|&lt;br /&gt;
|100 Gbit/s Omni-Path&lt;br /&gt;
|-&lt;br /&gt;
|bigmem&lt;br /&gt;
|Big Memory Compute&lt;br /&gt;
|80&lt;br /&gt;
|3,000,000 MB&lt;br /&gt;
|24 hours ‡&lt;br /&gt;
|&lt;br /&gt;
|100 Gbit/s Omni-Path&lt;br /&gt;
|-&lt;br /&gt;
|gpu-v100&lt;br /&gt;
|GPU Compute&lt;br /&gt;
|80&lt;br /&gt;
|753,000 MB&lt;br /&gt;
|24 hours ‡&lt;br /&gt;
|2&lt;br /&gt;
|100 Gbit/s Omni-Path&lt;br /&gt;
|-&lt;br /&gt;
|apophis&amp;amp;dagger;&lt;br /&gt;
|Private Research Partition&lt;br /&gt;
|40&lt;br /&gt;
|185,000 MB&lt;br /&gt;
|7 days ‡&lt;br /&gt;
|&lt;br /&gt;
|100 Gbit/s Omni-Path&lt;br /&gt;
|-&lt;br /&gt;
|razi&amp;amp;dagger;&lt;br /&gt;
|Private Research Partition&lt;br /&gt;
|40&lt;br /&gt;
|185,000 MB&lt;br /&gt;
|7 days ‡&lt;br /&gt;
|&lt;br /&gt;
|100 Gbit/s Omni-Path&lt;br /&gt;
|-&lt;br /&gt;
|pawson&amp;amp;dagger;&lt;br /&gt;
|Private Research Partition&lt;br /&gt;
|40&lt;br /&gt;
|185,000 MB&lt;br /&gt;
|7 days ‡&lt;br /&gt;
|&lt;br /&gt;
|100 Gbit/s Omni-Path&lt;br /&gt;
|-&lt;br /&gt;
|sherlock&amp;amp;dagger;&lt;br /&gt;
|Private Research Partition&lt;br /&gt;
|7&lt;br /&gt;
|185,000 MB&lt;br /&gt;
|7 days ‡&lt;br /&gt;
|&lt;br /&gt;
|100 Gbit/s Omni-Path&lt;br /&gt;
|-&lt;br /&gt;
|theia&amp;amp;dagger;&lt;br /&gt;
|Private Research Partition&lt;br /&gt;
|28&lt;br /&gt;
|188,000 MB&lt;br /&gt;
|7 days ‡&lt;br /&gt;
|&lt;br /&gt;
|40 Gbit/s InfiniBand&lt;br /&gt;
|-&lt;br /&gt;
|synergy&amp;amp;dagger;&lt;br /&gt;
|Private Research Partition&lt;br /&gt;
|14&lt;br /&gt;
|245,000 MB&lt;br /&gt;
|7 days ‡&lt;br /&gt;
|&lt;br /&gt;
|40 Gbit/s InfiniBand&lt;br /&gt;
|-&lt;br /&gt;
|cpu2013&lt;br /&gt;
|Legacy General Purpose Compute&lt;br /&gt;
|16&lt;br /&gt;
|120000&lt;br /&gt;
|7 days ‡&lt;br /&gt;
|&lt;br /&gt;
|40 Gbit/s InfiniBand&lt;br /&gt;
|-&lt;br /&gt;
|lattice&lt;br /&gt;
|Legacy General Purpose Compute&lt;br /&gt;
|8&lt;br /&gt;
|12000&lt;br /&gt;
|7 days ‡&lt;br /&gt;
|&lt;br /&gt;
|40 Gbit/s InfiniBand&lt;br /&gt;
|-&lt;br /&gt;
|parallel&lt;br /&gt;
|Legacy General Purpose Compute&lt;br /&gt;
|12&lt;br /&gt;
|23000&lt;br /&gt;
|7 days ‡&lt;br /&gt;
|&lt;br /&gt;
|40 Gbit/s InfiniBand&lt;br /&gt;
|-&lt;br /&gt;
|single&lt;br /&gt;
|Legacy Single-Node Job Compute&lt;br /&gt;
|8&lt;br /&gt;
|12000&lt;br /&gt;
|7 days ‡&lt;br /&gt;
|&lt;br /&gt;
|40 Gbit/s InfiniBand&lt;br /&gt;
|-&lt;br /&gt;
|cpu2021-bf24&lt;br /&gt;
|Back-fill Compute (2021-era hardware, 24h)&lt;br /&gt;
|48&lt;br /&gt;
|185,000 MB&lt;br /&gt;
|24 hours ‡&lt;br /&gt;
|&lt;br /&gt;
|100 Gbit/s Omni-Path&lt;br /&gt;
|-&lt;br /&gt;
|cpu2019-bf05&lt;br /&gt;
|Back-fill Compute (2019-era hardware, 5h)&lt;br /&gt;
|40&lt;br /&gt;
|185,000 MB&lt;br /&gt;
|5 hours ‡&lt;br /&gt;
|&lt;br /&gt;
|100 Gbit/s Omni-Path&lt;br /&gt;
|-&lt;br /&gt;
|cpu2017-bf05&lt;br /&gt;
|Back-fill Compute (2017-era hardware, 5h)&lt;br /&gt;
|14&lt;br /&gt;
|245,000 MB&lt;br /&gt;
|5 hours ‡&lt;br /&gt;
|&lt;br /&gt;
|40 Gbit/s InfiniBand&lt;br /&gt;
|-&lt;br /&gt;
|+ style=&amp;quot;caption-side: bottom; text-align: left; font-weight: normal;&amp;quot; | &amp;amp;dagger; These partitions contain hardware contributed to ARC by particular researchers and should only be used by members of their research groups. However, they have generously allowed their compute nodes to be shared with others outside their research groups for short jobs.  A special &#039;back-fill&#039; or -bf partition is available for use by all ARC users for jobs shorter than 5 hours.&amp;lt;br /&amp;gt;‡ As time limits may be changed by administrators to adjust to maintenance schedules or system load, the values given in the tables are not definitive.  See the Time limits section below for commands you can use on ARC itself to determine current limits.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==== Backfill partitions ====&lt;br /&gt;
Backfill partitions can be used by all users on ARC for short-term jobs. The hardware backing these partitions are generously contributed by researchers.  We recommend including the backfill partitions for short term jobs as it may help reduce your job&#039;s wait time and increase the overall cluster throughput.&lt;br /&gt;
&lt;br /&gt;
Previously, each contributing research group had their own backfill partition. Since June 2021, we have merged:&lt;br /&gt;
&lt;br /&gt;
* apophis-bf, pawson-bf, and razi-bf into cpu2019-bf05 &lt;br /&gt;
* theia-bf and synergy-bf into cpu2017-bf05&lt;br /&gt;
&lt;br /&gt;
The naming scheme of the backfill partitions is the CPU generation year, followed by -bf and the time limit in hours.  For example, cpu2017-bf05 would represent a backfill partition containing processors from 2017 with a time limit of 5 hours.&lt;br /&gt;
&lt;br /&gt;
==== Hardware resource and job policy limits ====&lt;br /&gt;
In addition to the hardware limitations, please be aware that there may also be policy limits imposed on your account for each partition. These limits restrict the number of cores, nodes, or GPUs that can be used at any given time. Since the limits are applied on a partition-by-partition basis, using resources in one partition should not affect the available resources you can use in another partition.&lt;br /&gt;
&lt;br /&gt;
These limits can be listed by running:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
$ sacctmgr show qos format=Name,MaxWall,MaxTRESPU%20,MaxSubmitJobs&lt;br /&gt;
      Name     MaxWall            MaxTRESPU MaxSubmit&lt;br /&gt;
---------- ----------- -------------------- ---------&lt;br /&gt;
    normal  7-00:00:00                           2000&lt;br /&gt;
    breezy  3-00:00:00              cpu=384      2000&lt;br /&gt;
       gpu  7-00:00:00                          13000&lt;br /&gt;
   cpu2019  7-00:00:00              cpu=240      2000&lt;br /&gt;
  gpu-v100  1-00:00:00    cpu=80,gres/gpu=4      2000&lt;br /&gt;
    single  7-00:00:00      cpu=408,node=75      2000&lt;br /&gt;
      razi  7-00:00:00                           2000&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Specifying a partition in a job ====&lt;br /&gt;
One you have decided which partitions best suits your computation, you can select one or more partition on a job-by-job basis by including the &amp;lt;code&amp;gt;partition&amp;lt;/code&amp;gt; keyword for an &amp;lt;code&amp;gt;SBATCH&amp;lt;/code&amp;gt; directive in your batch job. Multiple partitions should be comma separated.  If you omit the partition specification, the system will try to assign your job to appropriate hardware based on other aspects of your request. &lt;br /&gt;
&lt;br /&gt;
In some cases, you really should specify the partition explicitly.  For example, if you are running single-node jobs with thread-based parallel processing requesting 8 cores you could use:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#SBATCH --mem=0              ❶&lt;br /&gt;
#SBATCH --nodes=1            ❷&lt;br /&gt;
#SBATCH --ntasks=1           ❸&lt;br /&gt;
#SBATCH --cpus-per-task=8    ❹&lt;br /&gt;
#SBATCH --partition=single,lattice   ❺ &lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A few things to mention in this example:&lt;br /&gt;
# &amp;lt;code&amp;gt;--mem=0&amp;lt;/code&amp;gt; allocates all available memory on the compute node for the job. This effectively allocates the entire node for your job.&lt;br /&gt;
# &amp;lt;code&amp;gt;--nodes=1&amp;lt;/code&amp;gt; allocates 1 node for the job&lt;br /&gt;
# &amp;lt;code&amp;gt;--ntasks=1&amp;lt;/code&amp;gt; your job has a single task&lt;br /&gt;
# &amp;lt;code&amp;gt;--cpus-per-task=8&amp;lt;/code&amp;gt; asks for 8 CPUs per task. This job in total will request 8 * 1, or 8 CPUs.&lt;br /&gt;
# &amp;lt;code&amp;gt;--partition=single,lattice&amp;lt;/code&amp;gt; specifies that this job can run on either single or lattice.&lt;br /&gt;
Suppose that your job requires at most 8 CPU cores and 10 GB of memory. The above Slurm request would be valid and optimal since your job fits neatly in a single node on the single and parallel partition.  However, if you failed to specify the partition, Slurm may try to schedule your job to a partition with larger nodes, such as cpu2019 where each node has 40 cores and 190 GB of memory. If your job is scheduled on such a node, your job will be effectively wasting 32 cores and 180 GB of memory because &amp;lt;code&amp;gt;--mem=0&amp;lt;/code&amp;gt; not only requests for 190 GB on this node, but also prevents other jobs from being scheduled on the same node.&lt;br /&gt;
&lt;br /&gt;
If you don&#039;t specify a partition, please give greater thought to the memory specification to make sure that the scheduler will not assign your job more resources than are needed.&lt;br /&gt;
&lt;br /&gt;
Parameters such as &#039;&#039;&#039;--ntasks-per-cpu&#039;&#039;&#039;, &#039;&#039;&#039;--cpus-per-task&#039;&#039;&#039;, &#039;&#039;&#039;--mem&#039;&#039;&#039; and &#039;&#039;&#039;--mem-per-cpu&amp;gt;&#039;&#039;&#039; have to be adjusted according to the capabilities of the hardware also. The product of --ntasks-per-cpu and --cpus-per-task should be less than or equal to the number given in the &amp;quot;Cores/node&amp;quot; column.  The &#039;&#039;&#039;--mem&amp;gt;&#039;&#039;&#039; parameter (or the product of &#039;&#039;&#039;--mem-per-cpu&#039;&#039;&#039; and &#039;&#039;&#039;--cpus-per-task&#039;&#039;&#039;) should be less than the &amp;quot;Memory limit&amp;quot; shown. If using whole nodes, you can specify &#039;&#039;&#039;--mem=0&#039;&#039;&#039; to request the maximum amount of memory per node.&lt;br /&gt;
&lt;br /&gt;
===== Examples =====&lt;br /&gt;
Here are some examples of specifying the various partitions.&lt;br /&gt;
&lt;br /&gt;
As mentioned in the [[#Hardware|Hardware]] section above, the ARC cluster was expanded in January 2019.  To select the 40-core general purpose nodes specify:&lt;br /&gt;
&lt;br /&gt;
 #SBATCH --partition=cpu2019&lt;br /&gt;
&lt;br /&gt;
To run on the Tesla V100 GPU-enabled nodes, use the &#039;&#039;&#039;gpu-v100&#039;&#039;&#039; partition.  You will also need to include an SBATCH directive in the form &#039;&#039;&#039;--gres=gpu:n&#039;&#039;&#039; to specify the number of GPUs, n, that you need.  For example, if the software you are running can make use of both GPUs on a gpu-v100 partition compute node, use:&lt;br /&gt;
&lt;br /&gt;
 #SBATCH --partition=gpu-v100 --gres=gpu:2&lt;br /&gt;
&lt;br /&gt;
For very large memory jobs (more than 185000 MB), specify the bigmem partition:&lt;br /&gt;
&lt;br /&gt;
 #SBATCH --partition=bigmem&lt;br /&gt;
&lt;br /&gt;
If the more modern computers are too busy or you have a job well-suited to run on the compute nodes described in the legacy hardware section above, choose the cpu2013, Lattice or Parallel compute nodes by specifying the corresponding partition keyword:&lt;br /&gt;
&lt;br /&gt;
 #SBATCH --partition=cpu2013&lt;br /&gt;
 #SBATCH --partition=lattice&lt;br /&gt;
 #SBATCH --partition=parallel&lt;br /&gt;
&lt;br /&gt;
There is an additional partition called &#039;&#039;&#039;single&#039;&#039;&#039; that provides nodes similar to the lattice partition, but, is intended for single-node jobs. Select the single partition with&lt;br /&gt;
&lt;br /&gt;
 #SBATCH --partition=single&lt;br /&gt;
&lt;br /&gt;
=== Time limits ===&lt;br /&gt;
Use the &amp;lt;code&amp;gt;--time&amp;lt;/code&amp;gt; directive to tell the job scheduler the maximum time that your job might run.  For example:&lt;br /&gt;
 #SBATCH --time=hh:mm:ss&lt;br /&gt;
&lt;br /&gt;
You can use &amp;lt;code&amp;gt;scontrol show partitions&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;sinfo&amp;lt;/code&amp;gt; to see the current maximum time that a job can run.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot; highlight=&amp;quot;6&amp;quot;&amp;gt;&lt;br /&gt;
$ scontrol show partitions&lt;br /&gt;
PartitionName=single                                                                 &lt;br /&gt;
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL                                    &lt;br /&gt;
   AllocNodes=ALL Default=NO QoS=single                                              &lt;br /&gt;
   DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO        &lt;br /&gt;
   MaxNodes=UNLIMITED MaxTime=7-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED  &lt;br /&gt;
   Nodes=cn[001-168]                                                                 &lt;br /&gt;
   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO        &lt;br /&gt;
   OverTimeLimit=NONE PreemptMode=OFF                                                &lt;br /&gt;
   State=UP TotalCPUs=1344 TotalNodes=168 SelectTypeParameters=NONE                  &lt;br /&gt;
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED                                   &lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Alternatively, with &amp;lt;code&amp;gt;sinfo&amp;lt;/code&amp;gt; under the &amp;lt;code&amp;gt;TIMELIMIT&amp;lt;/code&amp;gt; column:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
$ sinfo                                                     &lt;br /&gt;
PARTITION  AVAIL  TIMELIMIT  NODES  STATE NODELIST               &lt;br /&gt;
single        up 7-00:00:00      1 drain* cn097                  &lt;br /&gt;
single        up 7-00:00:00      1  maint cn002                  &lt;br /&gt;
single        up 7-00:00:00      4 drain* cn[001,061,133,154]    &lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Support ==&lt;br /&gt;
{{Support&lt;br /&gt;
|title=[[Support|Need Help or have other ARC Related Questions?]]&lt;br /&gt;
|message=For all general RCS related issues, questions, or comments, please contact us at support@hpc.ucalgary.ca.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
Please don&#039;t hesitate to [[Support|contact us]] directly by email if you need help using ARC or require guidance on migrating and running your workflows to ARC.&lt;br /&gt;
&lt;br /&gt;
{{Navbox ARC}}&lt;br /&gt;
[[Category:ARC]]&lt;br /&gt;
[[Category:Guides]]&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=TALC_Cluster_Status&amp;diff=3454</id>
		<title>TALC Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=TALC_Cluster_Status&amp;diff=3454"/>
		<updated>2024-06-26T17:11:24Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{TALC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ May System Updates&lt;br /&gt;
| date = 2023/02/02&lt;br /&gt;
| message =&lt;br /&gt;
Beginning May 1, 2023, the TALC cluster will undergo operating system updates. The upgrade will happen after the end of term to minimize any disruption. Any existing jobs may be &lt;br /&gt;
temporarily held from scheduling. The upgrade is planned to be fully complete by May 5.&lt;br /&gt;
&lt;br /&gt;
The TALC login node will reboot on the morning of May 1.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = May System Updates Completed&lt;br /&gt;
| date = 2023/05/04&lt;br /&gt;
| message =&lt;br /&gt;
TALC upgrades have been completed. If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Power Interruption&lt;br /&gt;
| date = 2024/05/07&lt;br /&gt;
| message = TALC Experienced an brief power outage around 11AM May 7, 2024.&lt;br /&gt;
Most compute nodes have or are rebooting.  Most jobs running at this time &lt;br /&gt;
were lost. Administrators are actively working on restarting compute &lt;br /&gt;
nodes. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/06/26&lt;br /&gt;
| message = &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
[[Category:TALC]]&lt;br /&gt;
{{Navbox TALC}}&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
	<entry>
		<id>https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=3412</id>
		<title>ARC Cluster Status</title>
		<link rel="alternate" type="text/html" href="https://rcs.ucalgary.ca/index.php?title=ARC_Cluster_Status&amp;diff=3412"/>
		<updated>2024-06-11T19:33:07Z</updated>

		<summary type="html">&lt;p&gt;Cemagata: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{ARC Cluster Status}}&lt;br /&gt;
&lt;br /&gt;
== System Messages ==&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = January System Updates&lt;br /&gt;
| date = 2023/01/01&lt;br /&gt;
| message =&lt;br /&gt;
Beginning January 16, 2023, the ARC cluster will undergo operating system updates. We shall do our utmost to minimize disruption and allow ongoing jobs to be completed. New jobs may be temporarily held from scheduling.&lt;br /&gt;
&lt;br /&gt;
The ARC login node will reboot on the morning of January 16. Please save your work and log out if possible.&lt;br /&gt;
&lt;br /&gt;
The upgrade is planned to be fully complete by January 20.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = System Updates Completed&lt;br /&gt;
| date = 2023/01/24&lt;br /&gt;
| message =&lt;br /&gt;
The upgrade has been completed. The following has been changed:&lt;br /&gt;
* OS Updated to Rocky Linux 8.7&lt;br /&gt;
* Slurm updated to 22.05.7&lt;br /&gt;
* Apptainer replaces Singularity&lt;br /&gt;
* Each job will have its own /tmp, /dev/shm, /run/user/$uid mounted&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/02/28&lt;br /&gt;
| message =&lt;br /&gt;
We are currently investigating a filesystem issue that is causing filesystem slowdowns across ARC.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues&lt;br /&gt;
| date = 2023/03/1&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns across ARC. Some jobs on ARC have been paused to help us find the root cause of the slowdowns.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
Thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ARC Login node reboot&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
The ARC login node will be rebooted this afternoon for an emergency maintenance. This downtime is needed to help mitigate the filesystem slowdowns experienced on the login node.  Jobs will continue running and scheduling during this time.&lt;br /&gt;
&lt;br /&gt;
All logins to the ARC login node will be terminated at 3:00PM and will remain unavailable until 4:00PM.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = ⚠️ Filesystem Issues&lt;br /&gt;
| date = 2023/03/2&lt;br /&gt;
| message =&lt;br /&gt;
We are still currently investigating a filesystem issue that is causing filesystem slowdowns on specific nodes in our MSRDC location.&lt;br /&gt;
&lt;br /&gt;
We will update you with more information as it becomes available.&lt;br /&gt;
&lt;br /&gt;
We apologize for the inconvenience and thank you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Filesystem Issues Resolved&lt;br /&gt;
| date = 2023/03/10&lt;br /&gt;
| message =&lt;br /&gt;
We have upgraded the filesystem routers in our MSRDC location to address the performance issues.&lt;br /&gt;
&lt;br /&gt;
Please let us know if you experience any issues with the filesystem performance.&lt;br /&gt;
&lt;br /&gt;
Thank-you for your patience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/05/01&lt;br /&gt;
| message =&lt;br /&gt;
On May 1, 2023, the ARC Open OnDemand node will be rebooted between 5PM and 6PM. Expected downtime will be approximately 15 minutes.&lt;br /&gt;
&lt;br /&gt;
If you encounter any system issues, do not hesitate to let us know.&lt;br /&gt;
&lt;br /&gt;
Thank you for your cooperation.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Apptainer (Singularity) on ARC Login Node&lt;br /&gt;
| date = 2023/06/22&lt;br /&gt;
| message =&lt;br /&gt;
Apptainer (Singularity) containers may experience an error when&lt;br /&gt;
running on the Arc login node. If apptainer complains that a system&lt;br /&gt;
administrator needs to enable user namespaces, simply run your&lt;br /&gt;
containers inside a job.&lt;br /&gt;
&lt;br /&gt;
This is a temporary measure due to security vulnerability that will be&lt;br /&gt;
patched soon.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Lattice, Single, cpu2013 partition changes&lt;br /&gt;
| date = 2023/07/13&lt;br /&gt;
| message =&lt;br /&gt;
The Lattice and Single, and cpu2013 have all been decomissioned.  The Single&lt;br /&gt;
partition will be replaced by the nodes formerly in the cpu2013 partition but&lt;br /&gt;
will be called single.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Open OnDemand reboot&lt;br /&gt;
| date = 2023/10/17&lt;br /&gt;
| message =&lt;br /&gt;
Open OnDemand will be rebooted on October 17, 2023 for an update. It will be down for up to 30 minutes.&lt;br /&gt;
&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Storage Upgrade MARC/ARC cluster&lt;br /&gt;
| date = 2023/10/23&lt;br /&gt;
| message =&lt;br /&gt;
We will be performing storage upgrades on the MARC/ARC cluster on &lt;br /&gt;
November 16 and 17, 2023. To facilitate this, we will be throttling &lt;br /&gt;
down the number of jobs on both clusters while the upgrades are &lt;br /&gt;
performed&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Systems Operating Normally&lt;br /&gt;
| date = 2024/05/3&lt;br /&gt;
| message =&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = Power Interruption&lt;br /&gt;
| date = 2024/05/07&lt;br /&gt;
| message = Arc Experienced an brief power outage around 11AM May 7, 2024.&lt;br /&gt;
Most compute nodes have or are rebooting.  Most jobs running at this time &lt;br /&gt;
were lost. Arc administrators are actively working on restarting compute &lt;br /&gt;
nodes. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation&lt;br /&gt;
| date = 2024/06/03&lt;br /&gt;
| message = Job submissions targeted to the  GPU a100 partition will be &lt;br /&gt;
affected by a temporary reservation on the nodes to accommodate the RCS&lt;br /&gt;
summer school class taking place on 2024/Jun/10. Reservation will end &lt;br /&gt;
shortly after. Please submit your jobs normally and the scheduler will &lt;br /&gt;
start them as soon as the nodes are available. Sorry for the inconvenience.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{Message of the day item&lt;br /&gt;
| title = GPU a100 Node Reservation Removed&lt;br /&gt;
| date = 2024/06/11&lt;br /&gt;
| message = GPU a100 Nodes in ARC have been returned to normal scheduling. &lt;br /&gt;
}}&lt;br /&gt;
{{Navbox ARC}}&lt;br /&gt;
[[Category:ARC]]&lt;/div&gt;</summary>
		<author><name>Cemagata</name></author>
	</entry>
</feed>