Jupyter Notebooks: Difference between revisions

From RCSWiki
Jump to navigation Jump to search
 
(3 intermediate revisions by 2 users not shown)
Line 79: Line 79:
Start an interactive job on ARC:
Start an interactive job on ARC:
<pre>
<pre>
$ salloc -N 1 -n 1 -c 12 --mem=0 -t 3:00:00 -p parallel
$ salloc -N1 -n1 -c4 --mem=16gb -t 3:00:00 -p single
    salloc: Granted job allocation 5486616
salloc: Granted job allocation 33874452
    salloc: Waiting for resource configuration
salloc: Nodes h14 are ready for job
    salloc: Nodes cn0526 are ready for job
[username@h14 ~]$  
[username@cn0526 ~]$  
</pre>
</pre>
The command requested a 3 hours interactive job on the Parallel partition.
The command requested a 3 hours interactive job on the Parallel partition.
The request includes all 12 CPUs and all the 23GB of RAM on the compute node.
The request includes all 4 CPUs and all the 16GB of RAM on the compute node.
The compute node '''cn026''', was allocated to the job.
The compute node '''h14''', was allocated to the job.


Load a python software module you want to use and start a '''jupyter notebook''' server.
Load a python software module you want to use and start a '''jupyter notebook''' server.
Before starting the notebook we have to unset the '''XDG_RUNTIME_DIR''' variable,
Before starting the notebook we have to unset the '''XDG_RUNTIME_DIR''' variable,
otherwise the notebook crashes.
otherwise the notebook may crash (you '''do not need''' to do this now).  
<pre>
<pre>
$ module load python/anaconda3-2018.12
$ module load python/3.12.5
$ unset XDG_RUNTIME_DIR
$ unset XDG_RUNTIME_DIR


$ jupyter notebook --no-browser --ip=0.0.0.0
$ jupyter notebook --no-browser --ip=0.0.0.0


[I 16:04:31.618 NotebookApp] JupyterLab extension loaded from /global/software/anaconda/anaconda3-2018.12/lib/python3.6/site-packages/jupyterlab
[I 2025-02-18 13:30:12.439 ServerApp] jupyter_lsp | extension was successfully linked.
[I 16:04:31.618 NotebookApp] JupyterLab application directory is /global/software/anaconda/anaconda3-2018.12/share/jupyter/lab
[I 2025-02-18 13:30:12.443 ServerApp] jupyter_server_terminals | extension was successfully linked.
[I 16:04:31.621 NotebookApp] Serving notebooks from local directory: /home/drozmano
[I 2025-02-18 13:30:12.449 ServerApp] jupyterlab | extension was successfully linked.
[I 16:04:31.621 NotebookApp] The Jupyter Notebook is running at:
[I 2025-02-18 13:30:12.454 ServerApp] notebook | extension was successfully linked.
[I 16:04:31.621 NotebookApp] http://(cn0526 or 127.0.0.1):8888/?token=bedfe1920e97a309c583a5f2895cf1367e37a7dac416494b
[I 2025-02-18 13:30:12.961 ServerApp] notebook_shim | extension was successfully linked.
[I 16:04:31.621 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[I 2025-02-18 13:30:13.019 ServerApp] notebook_shim | extension was successfully loaded.
[C 16:04:31.673 NotebookApp]  
....
....
[C 2025-02-18 13:30:13.073 ServerApp]  
      
      
     To access the notebook, open this file in a browser:
     To access the server, open this file in a browser:
         file:///home/drozmano/.local/share/jupyter/runtime/nbserver-25483-open.html
         file:///home/drozmano/.local/share/jupyter/runtime/jpserver-78009-open.html
     Or copy and paste one of these URLs:
     Or copy and paste one of these URLs:
         http://(cn0526 or 127.0.0.1):8888/?token=bedfe1920e97a309c583a5f2895cf1367e37a7dac416494b
         http://h14:8888/tree?token=9beed319458a43e947fea149caca0fcfb226f21c0f0669ad
        http://127.0.0.1:8888/tree?token=9beed319458a43e947fea149caca0fcfb226f21c0f0669ad
....
</pre>
</pre>
At this point the notebook server is up and listening for connections.
You need to use your browser on your local computer (desktop or laptop) to connect to it.
You have to use the URL with the token from the last line of that print out.


 
The compute node name, '''h14''', is an internal name to the ARC cluster.  
The problem is that the compute node name, '''cn0526''', is an internal name to the ARC cluster.  
This name is '''not known outside of ARC''', so really, you cannot connect to it as it is.
This name is not known outside of ARC, so really, you cannot connect to it as it is.
So we have to setup an '''SSH tunnel''' from your local computer to that node '''h14'''.
So we have to setup an '''SSH tunnel''' from your local computer to that node '''cn0526'''.


=== Mac or Linux local computers ===
=== Mac or Linux local computers ===
Line 124: Line 124:
with the '''tunnel''' option:
with the '''tunnel''' option:
<pre>
<pre>
$ ssh username@arc.ucalgary.ca -L 8888:cn0526:8888
$ ssh username@arc.ucalgary.ca -L 8888:h14:8888
....
....
</pre>
</pre>
This is a normal ssh connection with a usual text mode session, but we need this session on to keep the tunnel open.
This is a normal ssh connection with a usual text mode session, but we need this session on to keep the tunnel open.


The ssh client creates a tunnel from '''your local computer''''s port 8888 to the '''compute node cn0526''' via the ARC's cluster '''login node'''.
The ssh client creates a tunnel from '''your local computer''''s port 8888 to the '''compute node h14''' via the ARC's cluster '''login node'''.
The trick is that your local computer does not know the '''cn0526''' name and has no way of connecting to it, but the ARC's login node does.
The trick is that your local computer does not know the '''h14''' name and has no way of connecting to it, but the ARC's login node does.


So, any connection made to your own computer on the port 8888 will be '''tunneled''' through  
So, any connection made to your own computer on the port 8888 will be '''tunneled''' through  
the ARC's login node to the port 8888 of ARC's compute node '''cn0526'''.
the ARC's login node to the port 8888 of ARC's compute node '''h14'''.
Conveniently, this is the port the notebook is expecting the connections on.
Conveniently, this is the port the notebook is expecting the connections on.


Line 150: Line 150:
If your local computer is outside of the UofC campus network you have to use the Fort VPN client to connect to the UofC network first.
If your local computer is outside of the UofC campus network you have to use the Fort VPN client to connect to the UofC network first.


=== Windows ===
[[Category:Software]]
 
{{Navbox Software}}
To be determined....

Latest revision as of 20:34, 18 February 2025

Jupyter Notebook is an interactive web-based environment used for creating a wide range of workflows in data science, machine learning, and general computation. There are a wide range of programming languages that are supported through Jupyter Notebooks including Python, Julia, and R.

There are related products which help launch and manage Jupyter Notebooks including:

  • Jupyter Hub which allows users to launch a notebook instance from a web portal and
  • JupyterLab which is Jupyter's next-generation notebook interface and comes with a more flexible user interface.

This page will go over the available options available to you to get started with Jupyter Notebook.

Resources:


ARC

RCS offers the following methods to use Jupyter on ARC. You must have an ARC account for the following options.

Open OnDemand

Main Article: Open OnDemand

Jupyter Notebook can be launched through Open OnDemand on ARC accessible at https://ood-arc.rcs.ucalgary.ca . You may launch Jupyter Notebook or JupyterLab directly on the cluster with support for a wide range of programming languages using one of the many pre-made Jupyter environments. This is the preferred way to launch Jupyter Notebooks due to its ease of use and flexibility.

The Alliance Jupyter Hub for UofC (Syzygy)

The Pacific Institute for the Mathematical Sciences in collaboration with the Digital Research Alliance of Canada and Cybera offer cloud-based hubs to universities and schools. Each institution can have its own hub where users authenticate with their credentials from that institution. The hubs are hosted on the Alliance Cloud and are essentially for training purposes.

Login with your UofC credentials.

Supports python2, python3, julia, and R.

  • An article about the service:
https://www.computecanada.ca/featured/compute-canada-and-pims-launch-jupyter-service-for-researchers/

Here is some information about the hub:

  • Teaching with Syzygy:
https://intro.syzygy.ca/teaching/
  • FAQ about Syzygy:
https://intro.syzygy.ca/troubleshooting/

Jupyter on the Digital Research Alliance of Canada clusters

The Alliance services require an Alliance account. An account can be obtained at https://ccdb.computecanada.ca

  • General information on running a Jupyter server on a compute node:
https://docs.alliancecan.ca/wiki/JupyterHub

Beluga cluster Jupyter Hub

At https://jupyterhub.beluga.calculquebec.ca/hub/login

  • Login with you Compute Canada credentials.
  • In the Server Options dialog, select the account, time, number of CPUs, and amount of memory.
  • Click Start button.
  • Once the server starts, click on the Python 3.7 notebook button to get a Jupyter Python notebook session.
  • To exit, select the File -> Logout in the menu bar.

Jupyter Hub at the University of Toronto

  • Niagara cluster.
https://docs.scinet.utoronto.ca/index.php/Jupyter_Hub

Running a jupyter notebook on ARC's compute node

By using jupyter from a python installation on ARC, one can run jupyter notebooks on ARC's compute nodes and request resources according to one's need. This is most flexible way to run a notebook, but it does require more steps to setup.

Start an interactive job on ARC:

$ salloc -N1 -n1 -c4 --mem=16gb -t 3:00:00 -p single
salloc: Granted job allocation 33874452
salloc: Nodes h14 are ready for job
[username@h14 ~]$ 

The command requested a 3 hours interactive job on the Parallel partition. The request includes all 4 CPUs and all the 16GB of RAM on the compute node. The compute node h14, was allocated to the job.

Load a python software module you want to use and start a jupyter notebook server.

Before starting the notebook we have to unset the XDG_RUNTIME_DIR variable, otherwise the notebook may crash (you do not need to do this now).

$ module load python/3.12.5
$ unset XDG_RUNTIME_DIR

$ jupyter notebook --no-browser --ip=0.0.0.0

[I 2025-02-18 13:30:12.439 ServerApp] jupyter_lsp | extension was successfully linked.
[I 2025-02-18 13:30:12.443 ServerApp] jupyter_server_terminals | extension was successfully linked.
[I 2025-02-18 13:30:12.449 ServerApp] jupyterlab | extension was successfully linked.
[I 2025-02-18 13:30:12.454 ServerApp] notebook | extension was successfully linked.
[I 2025-02-18 13:30:12.961 ServerApp] notebook_shim | extension was successfully linked.
[I 2025-02-18 13:30:13.019 ServerApp] notebook_shim | extension was successfully loaded.
....
....
[C 2025-02-18 13:30:13.073 ServerApp] 
    
    To access the server, open this file in a browser:
        file:///home/drozmano/.local/share/jupyter/runtime/jpserver-78009-open.html
    Or copy and paste one of these URLs:
        http://h14:8888/tree?token=9beed319458a43e947fea149caca0fcfb226f21c0f0669ad
        http://127.0.0.1:8888/tree?token=9beed319458a43e947fea149caca0fcfb226f21c0f0669ad
....

The compute node name, h14, is an internal name to the ARC cluster. This name is not known outside of ARC, so really, you cannot connect to it as it is. So we have to setup an SSH tunnel from your local computer to that node h14.

Mac or Linux local computers

On you Mac or you Linux computer, open a terminal session and connect to ARC using the normal SSH client with the tunnel option:

$ ssh username@arc.ucalgary.ca -L 8888:h14:8888
....

This is a normal ssh connection with a usual text mode session, but we need this session on to keep the tunnel open.

The ssh client creates a tunnel from your local computer's port 8888 to the compute node h14 via the ARC's cluster login node. The trick is that your local computer does not know the h14 name and has no way of connecting to it, but the ARC's login node does.

So, any connection made to your own computer on the port 8888 will be tunneled through the ARC's login node to the port 8888 of ARC's compute node h14. Conveniently, this is the port the notebook is expecting the connections on.

Now start you browser on you local computer and connect to the URL from the notebook's print out:

http://localhost:8888/?token=bedfe1920e97a309c583a5f2895cf1367e37a7dac416494b

Note, that the name of the web server is localhost, you have to edit the URL manually.

Done.

The notebook server will run for the duration of the interactive job you requested on ARC. In this example, for 3 hours. Then the notebook server will be killed by the system, so make sure that you save your progress before this happens.

If your local computer is outside of the UofC campus network you have to use the Fort VPN client to connect to the UofC network first.