Draft:Jupyter Notebooks: Difference between revisions

From RCSWiki
Jump to navigation Jump to search
(initial changes)
 
No edit summary
Line 53: Line 53:
* '''Connections''' are only available for computers that are part of the UofC campus network or through the University General VPN (FortiVPN).
* '''Connections''' are only available for computers that are part of the UofC campus network or through the University General VPN (FortiVPN).


===Running a Jupyter Notebook directly on a compute node===
===Run manually on a compute node===
By using Jupyter from a python installation on ARC, you can run an instance of Jupyter Notebook on ARC's compute nodes and request resources according to one's need.  
You may run your own instance of Jupyter Notebook directly on ARC via an interactive job. This method will use a python installation already available on ARC and allows you to control the precise resource requests to the scheduler. This is most flexible way to run a notebook, but it does require more steps to setup.
This is most flexible way to run a notebook, but it does require more steps to setup.


Start an interactive job on ARC:
To begin, start an interactive job on ARC:
<pre>
<pre>
$ salloc -N 1 -n 1 -c 12 --mem=0 -t 3:00:00 -p parallel
$ salloc -N 1 -n 1 -c 12 --mem=0 -t 3:00:00 -p parallel
Line 65: Line 64:
[username@cn0526 ~]$  
[username@cn0526 ~]$  
</pre>
</pre>
The command requested a 3 hours interactive job on the Parallel partition.
The command requested a 3 hours interactive job on the Parallel partition. The request includes all 12 CPUs and all the 23GB of RAM on the compute node. In the example above, the compute node '''cn0526''', was allocated to the job.
The request includes all 12 CPUs and all the 23GB of RAM on the compute node.
The compute node '''cn026''', was allocated to the job.


Load a python software module you want to use and start a '''jupyter notebook''' server.
Load a python software module you want to use and start a '''jupyter notebook''' server. Before starting the notebook we have to unset the '''XDG_RUNTIME_DIR''' variable, otherwise the notebook crashes.
Before starting the notebook we have to unset the '''XDG_RUNTIME_DIR''' variable,
otherwise the notebook crashes.
<pre>
<pre>
$ module load python/anaconda3-2018.12
$ module load python/anaconda3-2018.12
Line 91: Line 86:
         http://(cn0526 or 127.0.0.1):8888/?token=bedfe1920e97a309c583a5f2895cf1367e37a7dac416494b
         http://(cn0526 or 127.0.0.1):8888/?token=bedfe1920e97a309c583a5f2895cf1367e37a7dac416494b
</pre>
</pre>
At this point the notebook server is up and listening for connections.  
At this point the notebook server is up and listening for connections. You need to use your browser on your local computer (desktop or laptop) to connect to it. You have to use the URL with the token from the last line of that print out.
You need to use your browser on your local computer (desktop or laptop) to connect to it.
You have to use the URL with the token from the last line of that print out.


The problem is that the compute node '''cn0526''' is on an internal network within the ARC cluster and cannot be accessed directly from outside of ARC. To work around this problem, we have to setup an '''SSH tunnel''' from your local computer to that node '''cn0526'''.


The problem is that the compute node name, '''cn0526''', is an internal name to the ARC cluster.
====SSH Tunnel on MacOS or Linux====
This name is not known outside of ARC, so really, you cannot connect to it as it is.
So we have to setup an '''SSH tunnel''' from your local computer to that node '''cn0526'''.
 
===Mac or Linux local computers===
On you Mac or you Linux computer, open a terminal session and connect to ARC using the normal '''SSH client'''
On you Mac or you Linux computer, open a terminal session and connect to ARC using the normal '''SSH client'''
with the '''tunnel''' option:
with the '''tunnel''' option:
Line 107: Line 97:
....
....
</pre>
</pre>
This is a normal ssh connection with a usual text mode session, but we need this session on to keep the tunnel open.
This is a normal SSH connection with a usual text mode session, but we need this session on to keep the tunnel open.


The ssh client creates a tunnel from '''your local computer''''s port 8888 to the '''compute node cn0526''' via the ARC's cluster '''login node'''.
The SSH client creates a tunnel from '''your local computer''''s port 8888 to the '''compute node cn0526''' via the ARC's cluster '''login node'''.
The trick is that your local computer does not know the '''cn0526''' name and has no way of connecting to it, but the ARC's login node does.
The trick is that your local computer does not know the '''cn0526''' name and has no way of connecting to it, but the ARC's login node does.


Line 118: Line 108:
Now '''start you browser''' on you local computer and connect to the URL from the notebook's print out:
Now '''start you browser''' on you local computer and connect to the URL from the notebook's print out:


<code>http://localhost:8888/?token=bedfe1920e97a309c583a5f2895cf1367e37a7dac416494b</code>
<code><nowiki>http://localhost:8888/?token=bedfe1920e97a309c583a5f2895cf1367e37a7dac416494b</nowiki></code>


Note, that the name of the web server is '''localhost''', you have to edit the URL manually.
Note, that the name of the web server is '''localhost''', you have to edit the URL manually.
Line 130: Line 120:
If your local computer is outside of the UofC campus network you have to use the Fort VPN client to connect to the UofC network first.
If your local computer is outside of the UofC campus network you have to use the Fort VPN client to connect to the UofC network first.


=== SSH Tunnel on Windows ===
If you are using OpenSSH, the same instructions for Mac and Linux apply.
On PuTTY, after connecting to ARC, you will need to create a SSH tunnel by going into the PuTTY settings.


# Open the PuTTY settings (right click on the window -> Change Settings)
# In the left sidebar under the Category options. Navigate to the Connection > SSH > Tunnels.
# Select Remote to define the type of SSH port forward.
# In the Source port field, enter the port number to use on your local system. (For example source port: 8888)
# Next, In the Destination field, enter the destination address followed by the port number. (For example Destination: cn0526:8888).
# Verify the details you added and press Add button.
# Close out of the settings window.
# You should now be able to access Jupyter Notebook at <code><nowiki>http://localhost:8888</nowiki></code>.


== See Also ==
== See Also ==
Line 138: Line 140:


*Manual: https://jupyter.readthedocs.io/en/latest/
*Manual: https://jupyter.readthedocs.io/en/latest/
==Windows==
To be determined....

Revision as of 17:36, 8 July 2021

Jupyter Notebook is an interactive web-based environment used for creating a wide range of workflows in data science, machine learning, and general computation. There are a wide range of programming languages that are supported through Jupyter Notebooks including Python, Julia, and R.

There are related products which help launch and manage Jupyter Notebooks including:

  • Jupyter Hub which allows users to launch a notebook instance from a web portal and
  • Jupyter Lab which is Jupyter's next-generation notebook interface and comes with a more flexible user interface.

This page will go over the available options available to you to get started with Jupyter Notebook.

Compute Canada

For users that have a Compute Canada account, the following options are available.

Jupyter Hub for UofC (Syzygy)

The Pacific Institute for the Mathematical Sciences in collaboration with Compute Canada and Cybera offers cloud-based hubs to universities and schools. Through this offering, a Jupyter Hub is available to the University of Calgary and hosted on the Compute Canada cloud but are only for training purposes. Notebooks through this service supports python2, python3, julia, and R. For more information about this service, please see: https://www.computecanada.ca/featured/compute-canada-and-pims-launch-jupyter-service-for-researchers/

Login with your UofC credentials.

Here is some information about the hub:

Jupyter on Compute Canada clusters

Compute Canada services require a Compute Canada account. An account can be obtained at https://ccdb.computecanada.ca

General information on running a Jupyter server on a compute node can be found at https://docs.computecanada.ca/wiki/Jupyter

Beluga cluster Jupyter Hub

At https://jupyterhub.beluga.calculquebec.ca/hub/login

  • Login with you Compute Canada credentials.
  • In the Server Options dialog, select the account, time, number of CPUs, and amount of memory.
  • Click Start button.
  • Once the server starts, click on the Python 3.7 notebook button to get a Jupyter Python notebook session.
  • To exit, select the File -> Logout in the menu bar.

Niagra cluster at the University of Toronto

The Niagara cluster hosted as part of Compute Canada also runs a Jupyter Hub. More information at https://docs.scinet.utoronto.ca/index.php/Jupyter_Hub

You may use your Compute Canada account at: https://jupyter.scinet.utoronto.ca/

ARC

RCS offers the following methods to use Jupyter on ARC.

Jupyter Hub on ARC

A Jupyter Hub is running on the ARC cluster and is available at http://jupyter.ucalgary.ca from a University campus network. Please be aware of the following limitations:

  • You must have an ARC account to connect to the hub.
  • Currently supports python3 only.
  • Connections are only available for computers that are part of the UofC campus network or through the University General VPN (FortiVPN).

Run manually on a compute node

You may run your own instance of Jupyter Notebook directly on ARC via an interactive job. This method will use a python installation already available on ARC and allows you to control the precise resource requests to the scheduler. This is most flexible way to run a notebook, but it does require more steps to setup.

To begin, start an interactive job on ARC:

$ salloc -N 1 -n 1 -c 12 --mem=0 -t 3:00:00 -p parallel
    salloc: Granted job allocation 5486616
    salloc: Waiting for resource configuration
    salloc: Nodes cn0526 are ready for job
[username@cn0526 ~]$ 

The command requested a 3 hours interactive job on the Parallel partition. The request includes all 12 CPUs and all the 23GB of RAM on the compute node. In the example above, the compute node cn0526, was allocated to the job.

Load a python software module you want to use and start a jupyter notebook server. Before starting the notebook we have to unset the XDG_RUNTIME_DIR variable, otherwise the notebook crashes.

$ module load python/anaconda3-2018.12
$ unset XDG_RUNTIME_DIR

$ jupyter notebook --no-browser --ip=0.0.0.0

[I 16:04:31.618 NotebookApp] JupyterLab extension loaded from /global/software/anaconda/anaconda3-2018.12/lib/python3.6/site-packages/jupyterlab
[I 16:04:31.618 NotebookApp] JupyterLab application directory is /global/software/anaconda/anaconda3-2018.12/share/jupyter/lab
[I 16:04:31.621 NotebookApp] Serving notebooks from local directory: /home/drozmano
[I 16:04:31.621 NotebookApp] The Jupyter Notebook is running at:
[I 16:04:31.621 NotebookApp] http://(cn0526 or 127.0.0.1):8888/?token=bedfe1920e97a309c583a5f2895cf1367e37a7dac416494b
[I 16:04:31.621 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 16:04:31.673 NotebookApp] 
    
    To access the notebook, open this file in a browser:
        file:///home/drozmano/.local/share/jupyter/runtime/nbserver-25483-open.html
    Or copy and paste one of these URLs:
        http://(cn0526 or 127.0.0.1):8888/?token=bedfe1920e97a309c583a5f2895cf1367e37a7dac416494b

At this point the notebook server is up and listening for connections. You need to use your browser on your local computer (desktop or laptop) to connect to it. You have to use the URL with the token from the last line of that print out.

The problem is that the compute node cn0526 is on an internal network within the ARC cluster and cannot be accessed directly from outside of ARC. To work around this problem, we have to setup an SSH tunnel from your local computer to that node cn0526.

SSH Tunnel on MacOS or Linux

On you Mac or you Linux computer, open a terminal session and connect to ARC using the normal SSH client with the tunnel option:

$ ssh username@arc.ucalgary.ca -L 8888:cn0526:8888
....

This is a normal SSH connection with a usual text mode session, but we need this session on to keep the tunnel open.

The SSH client creates a tunnel from your local computer's port 8888 to the compute node cn0526 via the ARC's cluster login node. The trick is that your local computer does not know the cn0526 name and has no way of connecting to it, but the ARC's login node does.

So, any connection made to your own computer on the port 8888 will be tunneled through the ARC's login node to the port 8888 of ARC's compute node cn0526. Conveniently, this is the port the notebook is expecting the connections on.

Now start you browser on you local computer and connect to the URL from the notebook's print out:

http://localhost:8888/?token=bedfe1920e97a309c583a5f2895cf1367e37a7dac416494b

Note, that the name of the web server is localhost, you have to edit the URL manually.

Done.

The notebook server will run for the duration of the interactive job you requested on ARC. In this example, for 3 hours. Then the notebook server will be killed by the system, so make sure that you save your progress before this happens.

If your local computer is outside of the UofC campus network you have to use the Fort VPN client to connect to the UofC network first.

SSH Tunnel on Windows

If you are using OpenSSH, the same instructions for Mac and Linux apply.

On PuTTY, after connecting to ARC, you will need to create a SSH tunnel by going into the PuTTY settings.

  1. Open the PuTTY settings (right click on the window -> Change Settings)
  2. In the left sidebar under the Category options. Navigate to the Connection > SSH > Tunnels.
  3. Select Remote to define the type of SSH port forward.
  4. In the Source port field, enter the port number to use on your local system. (For example source port: 8888)
  5. Next, In the Destination field, enter the destination address followed by the port number. (For example Destination: cn0526:8888).
  6. Verify the details you added and press Add button.
  7. Close out of the settings window.
  8. You should now be able to access Jupyter Notebook at http://localhost:8888.

See Also

For more information on Jupyter: