Why Install JupyterHub?
I have long held a huge appreciation for Python, as both a general programming language, and data science platform. However as an individual who is highly digital mobile, maintaining python installations and data connections across a multitude of devices quickly became tiring. Which is why I began the search for a solution. The reasons why I settled on Jupyterhub are:
- Browser based. With a URL and log in credentials I can easily log in and pick up right where i left off.
- Using a server’s resources instead of local resources. This is particularly helpful as it extends the battery life of my laptop when I am programming away from home. I can also hit the run button on code, close the tab/browser and then come back hours, or days later at the same spot, with no need to re run the previous steps.
- Jupyter Notebooks, as an IDE, makes code documentation easy, using different blocks, instead of comment strings etc.
- Data sources, and database connections are static and do not need to be re worked as they are on the server itself.
- I am hosting the server within a cloud environment, that will allow me to scale up and down as needed, helping to reduce costs while also providing the opportunity to ramp up processing power or resources as my required.
Now that I have established the why, here is the how
The Installation Process
First step of the installation process is to have something to install on! For this task, I have chosen a Ubuntu 16.04 droplet from Digital Ocean. I do not need anything too powerful to start with, so the smallest box will be fine, at least for the moment. Unfortunately, Digital Ocean, unlike other providers, such as AWS does not have any Australian based options, which forces me to choose a non local location. After a ping test to their servers i found that, to my surprise the best one for me was the New York location. Now I just needed to add a SSH key and I am ready to hit the go button.
Now that the server is up and running we will begin the JupyterHub installation process
To start the process, we will run through the Quick Start guide found here. I have chosen to do the pip, npm installation. However as I am planning to have the server run as a service on boot of the machine, I do not want to lock it into a python environment, so I will not be running the python3 -m part of the instructions.
pip3 install jupyterhub npm install -g configurable-http-proxy pip3 install notebook
At this point you can start the JupyterHub server
If you visit localhost:8000 you can then log in with your unix credentials.
Now that Jupyterhub is installed, we then move onto configuring the server. There are a few changes I am interested in making, these include:
- SSL Encryption; and
- Spawner location
Before we begin, lets create the config file in the /etc/jupyterhub location.
cd /etc/jupyterhub jupyerhub --generate-config
SSL encryption is a must have as JupyterHub includes authentication, and code execution. I will be using LetsEncrypt to obtain free SSL certificates, however you can also use a self signed certificate, if you can stand a security warning every time you navigate to the server.
I used Certbot to get my certificates. Certbot maintains a PPA, requiring you to add it to your list of repositories in your system.
apt-get update apt-get install software-properties-common add-apt-repository ppa:certbot/certbot apt-get update apt-get install python-certbot-nginx
Once installed we will run a cert only instructions as we want to manually add the certificate to JupyterHub.
certbot --nginx certonly
Certbot also automatically renews the certificates using a cronjob, you can test that it will work using
certbot renew --dry-run
Using the certificate
Now that we have the certificate we then tell JupyterHub to use the certificates and where to get them. Do so by adding, or adjusting the jupyterhub_config.py file we generated earlier
c.JupyterHub.ssl_key = 'path/to/certificate' c.JupyterHub.ssl_cert = 'path/to/my.cert'
The spawner location is used to set the users Root directory for JupyterHub to reference. This could be set to a shared location allowing all users to access the same files, or to a user specific location. In my case, as I am expecting myself to be the only user they are one in the same. So altering the jupyterhub_config.py I added
c.Spawner.notebook_dir = '~/notebooks'
You also need to ensure the drive location exists prior to the user logging in, otherwise an error will happen.
Now that everything has been done you can run JupyterHub test by running the following command
jupyterhub -f /etc/jupyterhub/jupyterhub_config.py
Running JupyterHub as a service
Now that the JupyterHub is all set up, the last thing that I felt necessary was to set it up as a service that would auto start after each reboot. To do so save the following as /lib/systemd/system/jupyterhub.service
[Unit] Description=Jupyterhub After=network-online.target [Service] User=root ExecStart=/usr/local/bin/jupyterhub -f /etc/jupyterhub/jupyterhub_config.py [Install] WantedBy=multi-user.target
Once saved run the following and we’re done!
systemctl daemon-reload systemctl start jupyterhub
This project, and blog was completed with the help of the following: