
The kiosk-console
is the entry point for users to spin up a DeepCell Kiosk, a cloud-native implementation of the DeepCell ecosystem.
The DeepCell Kiosk is designed to allow researchers to easily deploy and scale a deep learning platform for biological image analysis. Once launched, users can drag-and-drop images to be processed in parallel using publicly available, or custom-built, TensorFlow models. To train custom models, please refer to deepcell-tf, which was designed to facilitate model development and is capable of exporting these models for use with the DeepCell Kiosk.
The scalability of the DeepCell Kiosk software is enabled by Kubernetes. At present, the Kiosk is only compatible with Google Cloud.
An example of the DeepCell Kiosk is live at DeepCell.org.
Features¶
Cloud-based deployment of deep-learning models
Scalable platform that minimizes cost and inference time
Drag and drop interface for running predictions
Examples¶
Raw Image | Tracked Image |
![]() |
![]() |
Getting Started¶
Start a terminal shell and install the DeepCell Kiosk wrapper script:
docker run -e DOCKER_TAG=1.8.1 vanvalenlab/kiosk-console:1.8.1 | sudo bash
To start the kiosk, just run kiosk-console
from the terminal shell.
Check out our docs for more information on how to start your own kiosk.
Software Architecture¶

Consumer: Retrieves items from the Job Queue and handles the processing pipeline for that item. Each consumer only works on one item at a time.
Model Server: Serves models over a gRPC API, allowing consumers to send data and get back predictions.
GPU Autoscaler: Automatically and efficiently scales Kubernetes GPU resources.
Frontend: API for creating and managing jobs, and a React-based web interface for the DeepCell Kiosk.
Additional Data Entry Tools:
ImageJ Plugin: An ImageJ 1.x plugin for processing images with an existing cluster.
Command-line Interface: A python-based CLI for submitting and managing DeepCell Kiosk jobs.
Not pictured above:
Bucket Monitor: Purges the bucket of uploaded and processed files that are older than
AGE_THRESHOLD
, 3 days by default.Janitor: Monitors in-progress items and makes sure no jobs get left un-finished.
Contribute¶
We welcome contributions to the kiosk. If you are interested, please refer to our Developer Documentation, Code of Conduct and Contributing Guidelines.
Support¶
Issues are managed through Github. Documentation is hosted on Read the Docs. A FAQ page is also available.
License¶
This software is license under a modified Apache-2.0 license. See LICENSE for full details.
Trademarks¶
All other trademarks referenced herein are the property of their respective owners.
Credits¶
This kiosk was developed with Cloud Posse, LLC. They can be reached at hello@cloudposse.com
Copyright¶
Copyright © 2018-2022 The Van Valen Lab at the California Institute of Technology (Caltech), with support from the Shurl and Kay Curci Foundation, the Paul Allen Family Foundation, Google, & National Institutes of Health (NIH) under Grant U24CA224309-01. All rights reserved.
Getting Started¶
Google Cloud Setup¶
Warning
Google Cloud Platform must approve several requests that may take up to 1 day to complete.
If necessary, create an account at Google Cloud and create a Google Cloud Project, making sure you have at least one account with the Owner role.
Make sure the Kubernetes Engine API is enabled.
The recent success of deep learning has been critically dependent on accelerated hardware like GPUs. Similarly, the strength of the DeepCell Kiosk is its ability to recruit and scale GPU nodes based on demand. In order to add accelerated hardware to the clusters you will launch, you will need to upgrade your Google Cloud account as they are unavailable with a free-tier account.
Note
The account upgrade may take some time, as Google will need to approve the upgrade. You may also need to log in and out of your account for the upgrade to take effect. Once your account is upgraded you should be able to see GPU options in the quota panel.
Apply for a quota of at least 1 “GPU (all regions)” and at least 16 “In-use IP addresses global”. This may take some time, as Google will need to approve each of these requests.
Note
Google offers a number of GPU types. The DeepCell Kiosk uses pre-emptible NVIDIA T4 GPUs for inference by default. To request more than one GPU, you must make a quota request for that resource in your chosen region.
Warning
Currently only pre-emptible GPUs are supported by the DeepCell Kiosk.
Create a cloud storage bucket in the default region of your project (this should be a “Standard class” bucket, which you can select using fine-grained access control). This will be used to store data and models. Record the bucket name, which will be needed during Kiosk configuration. Please do not use underscores (_) in your bucket name. Your bucket should follow the organizational structure that follows:
gs://[BUCKET-NAME] |-- models |-- Exported model 1 folder |-- Exported model 2 folder |-- uploads |-- output
Please note that the Kiosk comes “preloaded” with a few commonly used models. These models are hosted in our public
deepcell-models
bucket on Google Cloud. However, if you wish to use custom models, you can do so by altering theGCLOUD_STORAGE_BUCKET
environmental variable in thetf-serving
helmfile. The contents of/uploads
and/output
are managed by the kiosk-bucket-monitor.
Warning
The DeepCell Kiosk is optimized for cost-effectiveness. However, please ensure that your bucket and Kubernetes cluster are in the same region. See here for details but, simply put, you pay significantly more if your Kubernetes cluster and bucket are not in the same region.
Launching the DeepCell Kiosk¶
One of the enabling technologies the DeepCell Kiosk utilizes is Docker (FREE Community Edition). Installation is easy for Linux and MacOS, but the setup can be complicated for Windows. For this reason, we recommend Windows users employ an Ubuntu VM or follow the cloud jumpbox workflow outlined below.
If you plan on maintaining the DeepCell Kiosk as a persistent tool, we recommend using the jumpbox workflow, which allows you to manage the system from a Google Cloud VM. This prevents unexpected or accidental computer shutdowns that occur locally from interfering with your ability to manage the Kiosk.
Select the docker installation that is best for you:
Local Docker Installation - Windows¶
Install WSL and the Ubuntu Linux distribution
Once installed, follow the Docker installation instructions for Linux
Local Docker Installation - MacOS and Linux¶
Follow the docker installation instructions for your operating system
Cloud-Based Jumpbox Workflow¶
Navigate to the VM instances in the Google Cloud Console.
Check that your boot disk is configured for
Debian/Ubuntu
operating system
Warning
Container optimized images do not support Kiosk installation.
All other settings can be left as defaults
After creating the instance, SSH into your instance either using the option provided by Google Cloud or through your local terminal.
If you have chosen to SSH into the machine from a terminal on your local machine, simply paste the following commands copied from the Docker installation guide for Debian
sudo apt-get update && \
sudo apt-get install -y apt-transport-https ca-certificates curl gnupg2 software-properties-common && \
curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add - && \
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/debian $(lsb_release -cs) stable" && \
sudo apt-get update && \
sudo apt-get install -y containerd.io docker-ce docker-ce-cli git make vim
To manage docker as a non-root user on Linux, it is required to create the
docker
group and add your user to it using the commands below, then disconnect and reconnect to the server.
sudo groupadd docker
sudo usermod -aG docker $USER
Verify docker is installed correctly:
docker run hello-world
Starting the Kiosk¶
You are now ready to start the Kiosk!
Start a terminal shell and install the DeepCell Kiosk wrapper script:
docker run -e DOCKER_TAG=1.8.1 vanvalenlab/kiosk-console:1.8.1 | sudo bash
Note
This command and the one that follows may need to be preceded by sudo depending on your permission settings. This will require you to enter your password.
To start the Kiosk, just run
kiosk-console
from the terminal shell
Welcome Page |
Main Menu |
![]() |
![]() |
Note
Those interested in Kiosk development should follow a different path to start the Kiosk which is described in Developer Documentation.
DeepCell Kiosk Usage¶
Once the Kiosk Console has started, select the
Configure
option for your chosen cloud provider (currently, only Google Kubernetes Engine is supported). The next screen will prompt you to authenticate your account with gcloud or to continue with a previously authenticated account. The next several screens will prompt you to select a gcloud project, name your cluster, and enter a bucket name for data storage. If you followed the Google Cloud Setup instructions from above, you should use that project and bucket name.To complete cluster configuration, you have the option to choose between “Default 1 GPU”, “Default 4 GPU”, and “Advanced” configurations. The “Default 1 GPU” configuration option sets up a small cluster suitable for users looking to explore a sandbox. The “Default 4 GPU” option configures a cluster with 4 GPUs and nodes with more memory to handle larger inference jobs. The “Advanced” option allows users to configure each setting individually.
Once cluster configuration is complete, you will return to the home screen. There you can select the “Create” option to trigger cluster creation based on your configured values. This may take up to 10 minutes. Following successful creation, you will see a confirmation page.
Cluster Created Successfully |
![]() |
Find the cluster’s web address by choosing the
View
option form the Kiosk’s main menu. (Depending on your chosen cloud provider and the cloud provider’s settings, your cluster’s address might be either a raw IP address, e.g.,123.456.789.012
, or a URL, e.g.,deepcellkiosk.cloudprovider.com
.)Go to the cluster address in your web browser to find the DeepCell Kiosk frontpage. To run a job (load raw data and download the results) use the
Predict
tab.The
Predict
page on DeepCell.org allows for different job types (ie: nuclear segmentation and/or nuclear tracking). Each job type requires a specific model. For example models and data, refer to DeepCell.org.
Note
The first prediction may take some time as the model server comes online.
Troubleshooting¶
We’ve done our best to make the DeepCell Kiosk robust to common use cases, however, there may be unforeseen issues. In the following (as well as on our FAQ, we hope to cover some possible sources of frustration. If you run across a new problem not listed in either location, please feel free to open an issue on the Kiosk-Console repository.
Permission denied while trying to connect to the Docker daemon socket
Recipe for target 'docker/build' failed make: *** [docker/build] Error 1
My pods are not autoscaling because the custom metrics are not updating!
My predictions keep failing and I have a lot of models (or model versions) in my
models
folder.
DOCKER not defined in docker/build
¶
DOCKER not defined in docker/build
[directory]/kiosk/build-harness/modules/docker/Makefile.build:9: recipe for target 'docker/build' failed
make: *** [docker/build] Error 1
Docker is not installed. Refer to Getting Started for guidance on how to install docker.
Permission denied while trying to connect to the Docker daemon socket
¶
Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post h
ttp://%2Fvar%2Frun%2Fdocker.sock/v1.35/build?buildargs=%7B%7D&cachefrom=%5B%5D&cgroupparent=&cpuperiod=0&cpuquot
a=0&cpusetcpus=&cpusetmems=&cpushares=0&dockerfile=Dockerfile&labels=%7B%7D&memory=0&memswap=0&networkmode=defau
lt&rm=1&session=57da952107578b7cdaa0d35d533aefc8af001e6be3cb06960fe651a7f7990217&shmsize=0&t=vanvalenlab%2Fkiosk
%3Alatest&target=&ulimits=null: dial unix /var/run/docker.sock: connect: permission denied
[directory]/kiosk/build-harness/modules/docker/Makefile.build:9: recipe for target 'docker/build' failed
make: \*\*\* [docker/build] Error 1
This error means that your current user is not a member of the docker
user group. If you are running Linux, you can add yourself to the docker
user group with the following command: usermod -a -G docker $(whoami)
. Then log out and log back in.
If that command returns an error, you may not be on Linux. If you are on Linux, you may need to prepend that command with sudo
. In order for the sudo command to work, though, your current user must have root privileges.
Recipe for target 'docker/build' failed make: *** [docker/build] Error 1
¶
Building vanvalenlab/kiosk-console:latest from ./Dockerfile with [] build args...
ERRO[0000] failed to dial gRPC: cannot connect to the Docker daemon. Is 'docker daemon' running on this host?: d
ial unix /var/run/docker.sock: connect: permission denied
context canceled
[directory]/kiosk/build-harness/modules/docker/Makefile.build:9: recipe for target 'docker/build' failed
make: *** [docker/build] Error 1
You probably just added yourself to the docker
user group but haven’t logged and logged back in yet.
My pods are not autoscaling because the custom metrics are not updating!¶
Prometheus has a large memory footprint and is liable to be OOMKilled when there are many other pods running.
This can be confirmed by executing the following and inspecting the output.
kubectl describe node $(kubectl describe pod -n monitoring prometheus-prometheus-operator-prometheus-0 | grep Node: | awk '{print $2}' | cut -d '/' -f1)
The easiest way to resolve this issue is to upgrade the node types to something with more memory (n1-standard-4
seems to work well for large clusters).
My prediction never finishes¶
A consumer should always either successfully consume a job or fail and provide an error. If a submitted prediction job never completes and the “in progress” animation is running, it is likely that the consumer pod is out of memory/CPU resources. In this case, Kubernetes responds by killing the consumer before it can complete the job. To confirm that the consumer is being Evicted
, drop to shell and use kubectl get pods
. There are a few ways to resolve a consumer being evicted due to resource constraints:
Submit smaller images.
Redeploy the cluster with the more powerful nodes than the default
n1-standard-1
.Increase the memory/cpu resource request in the helmfile of the consumer. (Remember to follow this by issuing the following command
helm delete consumer-name; helmfile -l name=consumer-name sync
)
A prediction job may also never finish if the tf-serving
pod never comes up. If you see that the tf-serving
pod is not in status Running
or has been restarting, there is likely a memory/resource issue with the model server itself. If this is the case, please read below.
My predictions keep failing and I have a lot of models (or model versions) in my models
folder.¶
You could be experiencing a memory issue involving TensorFlow-Serving. The solution is to reduce the number of models or model versions you have in your models
folder. Other possible solutions, listed in descending order of likelihood of fixing your issue, include choosing GPU instances which have more memory, using smaller models, or, if possible, submitting smaller images for prediction. In our experience, using n1-highmem-2
and n1-highmem-4
instances, we ran into issues when we had more than roughly 10 model versions total across all models in the models
folder.
I hit an error during cluster destruction¶
There may be occasions where the Kiosk fails to deploy or the cluster destruction doesn’t execute properly and leaves orphaned cloud resources active. Both failed cluster deployment and failed cluster destruction after deployment can be the result of any number of issues. We can’t go into all of them here. Rather, our goal is to tell you how to remove all the cloud resources your cluster is using, so that you won’t end up unknowingly leaking money.
Google Cloud (Google Kubernetes Engine)¶
The Deepcell Kiosk uses Google Kubernetes Engine to requisition resources on Google Cloud. When the cluster is fully deployed, a wide array of Google Cloud resources will be in use. If a cluster creation or destruction fails, you should login to the Google Cloud web interface and delete the following resources by hand (n.b. the name of each resource will contain at least part of the cluster name in it):
Kubernetes cluster (Remember the cluster name for the following steps. This will delete most of the resources and the proceeding steps will clean up the rest.)
any Firewall Rules associated with your cluster
any LoadBalancers associated with your cluster
any Target Pools associated with your cluster
any Persistent Disks associated with your cluster
While we hope this list is comprehensive, there could be some lingering resources used by Google Cloud and not deleted automatically that we’re not aware of.
Tutorial: Creating a custom job¶
Rationale¶
In the kubernetes environment created by the kiosk, the task of processing images is coordinated by the redis-consumer. The number of consumers at work in any point in time is automatically scaled to match the number of images waiting in a work queue since each redis-consumer can only process one image at a time. Ultimately the redis-consumer is responsible for sending data to tf-serving containers to retrieve model predictions, but it also handles any pre- and post-processing steps that are required by a particular model.
Currently, deepcell.org supports a cell tracking feature which is facilitated by the caliban-consumer
, which handles the multi-step process of cell tracking:
Send each frame of the dataset for segmentation. Frames are processed in parallel utilizing scalability and drastically reducing processing time.
Retrieve model predictions and run post-processing to generate cell segmentation masks
Send cell segmentation masks for cell tracking predictions
Compile final tracking results and post for download
New data processing pipelines can be implemented by writing a custom consumer. The model can be exported for tf-serving using export_model()
.
The following variables will be used throughout the setup of the custom consumer. Pick out names that are appropriate for your consumer.
-
queue_name
¶ Specifies the queue name that will be used to identify jobs for the Custom Consumers, e.g.
'track'
-
consumer_name
¶ Name of custom consumer, e.g.
'caliban-consumer'
-
consumer_type
¶ Name of consumer job, e.g.
'caliban'
Designing a custom consumer¶
For guidance on the changes that need to be made to kiosk-redis-consumer, please see Custom Consumers
Deploying a custom consumer¶
The DeepCell Kiosk uses helm and helmfile to coordinate Docker containers. This allows the Custom Consumers to be easily extended by setting up a new docker image with your custom consumer.
If you do not already have an account on Docker Hub. Sign in to docker in your local environment using
docker login
.From the root of the
kiosk-redis-consumer
folder, rundocker build <image>:<tag>
and thendocker push <image>:<tag>
.In the
/conf/helmfile.d/
folder in your kiosk environment, add a new helmfile following the convention02##.custom-consumer.yaml
. The text for the helmfile can be copied from0250.caliban-consumer.yaml
as shown below. Then make the following changes to customize the helmfile to your consumer.Change
releases.name
toconsumer_name
Change
releases.values.image.repository
andreleases.values.image.tag
Change
releases.values.nameOverride
toconsumer_name
Change
releases.values.env.QUEUE
toqueue_name
Change
releases.values.env.CONSUMER_TYPE
toconsumer_type
Deploy your new helmfile to the cluster with:
helmfile -l name=my-new-consumer sync
Autoscaling custom consumers¶
Kubernetes scales each consumer using a Horizonal Pod Autoscaler (HPA).
Each HPA is configured in /conf/addons/hpa.yaml.
The HPA reads a consumer-specific custom metric, defined in /conf/helmfile.d/0600.prometheus-operator.yaml.
Each custom metric maximizes the work being done by balancing the amount of work left in the consumer’s Redis queue (made available by the prometheus-redis-exporter
) and the current GPU utilization.
Every job may have its own scaling requirements, and custom metrics can be tweaked to meet those requirements.
For example, the segmentation_consumer_key_ratio
in /conf/helmfile.d/0600.prometheus-operator.yaml demonstrates a more complex metric that tries to balance the ratio of TensorFlow Servers and consumers to throttle the requests-per-second.
To effectively scale your new consumer, some small edits will be needed in the following files:
/conf/helmfile.d/02##.custom-consumer.yaml
/conf/helmfile.d/0110.prometheus-redis-exporter.yaml
Within the
data.script
section of theprometheus-redis-exporter-script
ConfigMap, modify the sectionAll Queues to Monitor
to include the new queue (queue_name
).-- All Queues to Monitor: local queues = {} queues[#queues+1] = "segmentation" queues[#queues+1] = "caliban" queues[#queues+1] = "Your New QUEUE" for _,queue in ipairs(queues) do ...
/conf/helmfile.d/0600.prometheus-operator.yaml
Add a new
record
under- name: custom-redis-metrics
. In the example below, make the following modifications.Line 1: replace
caliban
withconsumer_type
Line 3: replace
caliban
withqueue_name
Line 12: replace
caliban
withconsumer_type
1 2 3 4 5 6 7 8 9 10 11 12
- record: caliban_consumer_key_ratio expr: |- avg_over_time(redis_script_value{key="caliban_image_keys"}[15s]) / on() ( avg_over_time(kube_deployment_spec_replicas{deployment="caliban-consumer"}[15s]) + 1 ) labels: namespace: deepcell service: caliban-scaling-service
/conf/helmfile.d/02##.custom-consumer.yaml
Finally, in the new consumer’s helmfile, add the new metric to the
hpa
block.Change
metadata.name
andspec.scaleTargetRef.name
toconsumer_name
Change
spec.metrics.object.metricName
andspec.metrics.object.target.name
toconsumer_type
1 2 3 4 5 6 7 8 9 10 11 12 13
hpa: enabled: true minReplicas: 1 maxReplicas: 50 metrics: - type: Object object: metricName: caliban_consumer_key_ratio target: apiVersion: v1 kind: Namespace name: caliban_consumer_key_ratio targetValue: 1
Connecting custom consumers with the Kiosk¶
A number of Kiosk components will need the new queue name in order to fully integrate the new job.
/conf/helmfile.d/0300.frontend.yaml
In the kiosk-frontend helmfile (/conf/helmfile.d/0300.frontend.yaml), add or modify the
env
variableJOB_TYPES
and replace withconsumer_type
.env: JOB_TYPES: "segmentation,caliban,<new job name>"
/conf/helmfile.d/0220.redis-janitor.yaml
The kiosk-redis-janitor monitors queues in an
env
variableQUEUES
for stalled jobs, and restarts them.consumer_type
must be added here as well.env: QUEUES: "segmentation,caliban,<new job name>"
/conf/helmfile.d/0210.autoscaler.yaml
The kiosk-autoscaler also has an
env
variableQUEUES
which it uses to determine whether a GPU must be activated. Addconsumer_type
to this variable too.env: QUEUES: "segmentation,caliban,<new job name>"
You will need to sync your helmfile in order to update your frontend website to reflect the change to the helmfile. Please run the following:
helm delete frontend; helmfile -l name=frontend sync
helm delete redis-janitor; helmfile -l name=redis-janitor sync
helm delete autoscaler; helmfile -l name=autoscaler sync
In a few minutes the Kiosk will be ready to process the new job type.
Developer Documentation¶
Welcome to the advanced documentation for DeepCell Kiosk developers. We will go over cluster customization, accessing cluster logs and metrics, less-common deployment workflows, a few design decisions that may be of interest to other developers, and other topics.
Shell Latency¶
When testing new features or workflows, DeepCell Kiosk developers will often find themselves using the built-in terminal inside the Kiosk. (Accessible via the Kiosk’s main menu as the “Shell” option.) This is a standard bash
shell and should be familiar to most developers. If you are using one of the advanced Kiosk deployment workflows (which increases shell latency slightly), you should avoid printing unknown and potentially large amounts of text to the screen.
This usually only comes up in the context of logs. To prevent this issue, we recommend the following:
stern is useful for tailing logs of multiple pods using can use human-readable time lengths. For example,
stern consumer -s 10m
will tail the last 10 minutes of logs for all pods with “consumer” in their name.When using
kubectl logs
be sure to include the--tail N
option to limit the total number of lines being returned. For example,kubectl logs [POD_NAME] --tail 100
to return the last 100 lines of the pod’s logs.
Starting the kiosk for development¶
# Clone this repo:
git clone git@github.com:vanvalenlab/kiosk-console.git
# Initialize the "build-harness":
make init
# Build the container:
make docker/build
# Install wrapper script:
make install
# Start the kiosk
make run
Docker-in-Docker deployment workflow¶
If you’d prefer not to install anything permanently on your machine, but also prefer not to use a jumpbox, you can run the kiosk from within a Docker container. To do this, we will use the “Docker in Docker” container created by Github user jpetazzo. First, clone the Github repository for docker-in-docker: https://github.com/jpetazzo/dind
. Then enter the dind
directory that was just created and execute
docker build -t dind/dind .
Once that image builds successfully, then you can paste the following string of commands, replacing [dind_container]
with your chosen container name, to the terminal in order to create the docker-in-docker container and get a terminal prompt inside it.
docker stop [dind_container]; \
docker rm [dind_container]; \
docker run -it --privileged --name [dind_container] dind/dind
Once inside the docker-in-docker container, you now have the ability to create further Docker containers, which is a necessary part of kiosk installation. So, in order to install the kiosk inside the docker-in-docker container and bring up the kiosk configuration GUI, simply paste the following commands to the docker-in-docker command line:
apt-get update && \
apt-get install -y make git vim && \
git clone https://www.github.com/vanvalenlab/kiosk-console && \
cd kiosk-console && \
make init && \
git checkout master && \
sed -i 's/sudo -E //' ./Makefile && \
make docker/build && \
make install && \
kiosk-console
From here, you can configure the kiosk as usual.
Design Decisions¶
To assist future developers with any alterations/extensions they wish to make to the Kiosk codebase, here we provide some insight into our decision making process for some key components within the platform.
Database Conventions¶
We’ve elected to write a hash to Redis for every image known to the cluster. In the hash, we have a variety of fields, none of which is ever modified after creation, except for the special “status” field, which acts as an indicator to the microservices in the cluster for where the image needs to be passed next.
Building custom consumer pipelines¶
If you are interested in deploying your own specialized models using the kiosk, you can easily develop a custom consumer.
For a guide on how to build a custom pipeline, please see Tutorial: Creating a custom job.
Accessing cluster metrics and logging using OpenVPN¶
Setting up OpenVPN¶
After cluster startup, choose
Shell
from the main menu. On the command line, execute the following command:POD_NAME=$(kubectl get pods --namespace "kube-system" -l app=openvpn -o jsonpath='{ .items[0].metadata.name }') && \ kubectl --namespace "kube-system" logs $POD_NAME --follow
If the OpenVPN pod has already deployed, you should see something like “Mon Apr 29 21:15:53 2019 Initialization Sequence Completed” somewhere in the output.
If you see that line, then execute
POD_NAME=$(kubectl get pods --namespace "kube-system" -l "app=openvpn,release=openvpn" -o jsonpath='{ .items[0].metadata.name }') SERVICE_NAME=$(kubectl get svc --namespace "kube-system" -l "app=openvpn,release=openvpn" -o jsonpath='{ .items[0].metadata.name }') SERVICE_IP=$(kubectl get svc --namespace "kube-system" "$SERVICE_NAME" -o go-template='{{ range $k, $v := (index .status.loadBalancer.ingress 0)}}{{ $v }}{{end}}') KEY_NAME=kubeVPN kubectl --namespace "kube-system" exec -it "$POD_NAME" /etc/openvpn/setup/newClientCert.sh "$KEY_NAME" "$SERVICE_IP" kubectl --namespace "kube-system" exec -it "$POD_NAME" cat "/etc/openvpn/certs/pki/$KEY_NAME.ovpn" > "$KEY_NAME.ovpn"
Then, copy the newly-generated
kubeVPN.ovpn
file onto your local machine. (You can do this either by viewing the file’s contents and copy-pasting them manually, or by using a file-copying tool like SCP).Next, using an OpenVPN client locally, connect to the cluster using
openvpn --config kubeVPN.ovpn
as your config file. You may need to usesudo
if the above does not work.--data-ciphers BF-CBC
(or another cipher name) may also be required depnding on your client version.
Cluster metrics¶
Once inside the cluster, you can connect to Grafana by going to
[service_IP]:[service_port]
for the relevant service from any web browser on your local machine. (To view the service ports and IPs, execute the commandkubectl get svc --all-namespaces
from the kiosk’s command line.)
Logging¶
For reliability reasons, logging facilities are disabled by default. To enable logging functionality, execute
export ELK_DEPLOYMENT_TOGGLE=ON; make gke/deploy/elk; make helmfile/create/elk
at the command line after cluster creation.Similar to step 5, you can connect to Kibana by going to
[service_IP]:[service_port]
for the relevant service from any web browser on your local machine.
Recovering from failed Kiosk creations or destructions¶
There may be occasions where the Kiosk fails to deploy or the cluster destruction doesn’t execute properly and leaves orphaned cloud resources active. Both failed cluster deployment and failed cluster destruction after deployment can be the result of any number of issues. Before you re-launch any future clusters, and to prevent you from unknowingly leaking money, you should remove all the vestigial cloud resources left from the failed launch/destruction.
The Deepcell Kiosk uses Google Kubernetes Engine to requisition resources on Google Cloud. When the cluster is fully deployed, a wide array of Google Cloud resources will be in use. If a cluster creation or destruction fails, you should login to the Google Cloud web interface and delete the following resources by hand (n.b. the name of each resource will contain at least part of the cluster name in it):
Kubernetes cluster (Remember the cluster name for the following steps. This will delete most of the resources and the proceeding steps will clean up the rest.)
any Firewall Rules associated with your cluster
any LoadBalancers associated with your cluster
any Target Pools associated with your cluster
any Persistent Disks associated with your cluster
While we hope this list is comprehensive, there could be some lingering resources used by Google Cloud and not deleted automatically that we’re not aware of.
Benchmarking the DeepCell Kiosk¶
The DeepCell Kiosk comes with a utility for benchmarking the scalability and performance of a deep learning workflow. To reproduce the cost and timing benchmarks reported in the DeepCell Kiosk paper, please refer to the 2020-Bannon_et_al-Kiosk folder of our figure creation repository. To run your own benchmarking, please read below.
If you don’t already have a cloud storage bucket for use with the DeepCell Kiosk, you should create one now. It’s fine to reuse this bucket across multiple DeepCell Kiosk clusters.
There are three variables in the benchmarking pod’s YAML file,
conf/helmfile.d/0410.benchmarking.yaml
, that may need to be customized before benchmarking:MODEL
is the model name and version that will be used in benchmarking. The model you choose should be present in themodels/
folder of your benchmarking bucket. See the Van Valen Lab’s benchmarking bucket for an example.FILE
is the name of the file that will be used for benchmarking. A file by this name should be in your benchmarking bucket in theuploads/
folder.COUNT
specifies how many times theFILE
will be submitted to the cluster for processing.
Deploy a DeepCell Kiosk as you normally would. While navigating the cluster configuration menu, pay special attention to two configuration settings:
The bucket name you provide should be that of the benchmarking bucket from step 1.
The Maximum Number of GPUs has a strong effect on benchmarking time by effectively limiting how large the cluster can scale.
Once the cluster has deployed successfully, drop to the
Shell
via the DeepCell Kiosk main menu and begin the benchmarking process by executing the following command:kubectl scale deployment benchmarking --replicas=1
Benchmarking jobs can take a day or more, depending on the conditions (number of images and max number of GPUs) chosen. To monitor the status of your benchmarking job, drop to the
Shell
within the DeepCell Kiosk main menu and execute the commandstern benchmarking -s 10m
. This will show you the most recent log output from the benchmarking pod. When benchmarking has finished, the final line in the log should beUploaded [FILEPATH] to [BUCKET] in [SECONDS] seconds.
, where[FILEPATH]
is the location in[BUCKET]
where the benchmarking data has been saved.Now that the benchmarking process has finished, clean up the benchmarking resources by executing
kubectl scale deployment benchmarking --replicas 0
at the DeepCell Kiosk’sShell
. This prevents the benchmarking process from executing multiple times in long-lived clusters.Finally, you can download and analyze your benchmarking data. Two top-level fields in this large JSON file that are of interest are:
time_elapsed
: the exact running time of the benchmarking procedure (seconds)total_node_and_networking_costs
: a slight underestimate of the total costs of the benchmarking run. (This total does not include Storage, Operation, or Storage Egress Fees. These extra fees can be calculated after the fact by using the Google Cloud guidelines.)