The default configuration of Bright’s Jupyter integration supports up to 1000 users on a single login node as long as the machine has adequate memory and CPU resources. In order to allow users to schedule workload, the number of compute nodes should be sized accordingly.
To allow the login host to support more than 1000 concurrent user sessions, several parameters can be adjusted.
# allow Jupyter Enterprise Gateway to open more ports
echo "EG_PORT_RETRIES=10000" >> /etc/default/jupyterhub-singleuser-gw
# support more open files and sockets
echo "fs.file-max = 268435456" > /etc/sysctl.d/95-jupyter-sysctl.conf
sysctl -w fs.file-max=268435456
systemctl restart cm-jupyterhub.service
If the login node where JupyterHub is running is a dedicated host and not a head node, changing the limit for open files is required on both the head node as well as on the login node (through the software image).
For benchmarking and validating purposes Bright Computing prepared a set of scripts to test Jupyter setup: https://github.com/Bright-Computing/jupyter-stress-test
- Create 100 users using cmsh
./run prepare 100
- Log users into Jupyter and simulate user session
Here 10 new users will be logged in simultaneously, followed by logging to all existing sessions with 20 concurrent processes../run benchmark https://loginnode:8000 10 20
The final output will be collected inbenchmark.log
- Remove all test users
./run clear loginnode
- To draw a graph based on gathered statistics
./run graph benchmark.log
If multiple ./runs benchmarks
are required it is advised to run systemctl restart cm-jupyterhub.service
on the login node to stop users’ sessions. To force JupyterHub to remove test users from the internal database add c.Authenticator.delete_invalid_users = True
to /cm/local/apps/jupyter/current/conf/jupyterhub_config.py
Or via cmsh:
configurationoverlay
-> use jupyterhub
-> roles
-> use jupyterhub
-> configs
-> add c.Authenticator.delete_invalid_users
-> ..
-> set c.authenticator.delete_invalid_users True
-> commit
Here is an example of the output. Results were gathered on a server with 2 x Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz CPUS and 1.4TB of RAM
We can conclude that this host allows for more than 5000 concurrent users, but if the login timeout exceeds 30 seconds (the default JupyterHub timeout for connecting to user sessions) users will increasingly encounter connection errors.