Tutorial: Creating a custom job¶
Rationale¶
In the kubernetes environment created by the kiosk, the task of processing images is coordinated by the redis-consumer. The number of consumers at work in any point in time is automatically scaled to match the number of images waiting in a work queue since each redis-consumer can only process one image at a time. Ultimately the redis-consumer is responsible for sending data to tf-serving containers to retrieve model predictions, but it also handles any pre- and post-processing steps that are required by a particular model.
Currently, deepcell.org supports a cell tracking feature which is facilitated by the caliban-consumer
, which handles the multi-step process of cell tracking:
Send each frame of the dataset for segmentation. Frames are processed in parallel utilizing scalability and drastically reducing processing time.
Retrieve model predictions and run post-processing to generate cell segmentation masks
Send cell segmentation masks for cell tracking predictions
Compile final tracking results and post for download
New data processing pipelines can be implemented by writing a custom consumer. The model can be exported for tf-serving using export_model()
.
The following variables will be used throughout the setup of the custom consumer. Pick out names that are appropriate for your consumer.
-
queue_name
¶ Specifies the queue name that will be used to identify jobs for the Custom Consumers, e.g.
'track'
-
consumer_name
¶ Name of custom consumer, e.g.
'caliban-consumer'
-
consumer_type
¶ Name of consumer job, e.g.
'caliban'
Designing a custom consumer¶
For guidance on the changes that need to be made to kiosk-redis-consumer, please see Custom Consumers
Deploying a custom consumer¶
The DeepCell Kiosk uses helm and helmfile to coordinate Docker containers. This allows the Custom Consumers to be easily extended by setting up a new docker image with your custom consumer.
If you do not already have an account on Docker Hub. Sign in to docker in your local environment using
docker login
.From the root of the
kiosk-redis-consumer
folder, rundocker build <image>:<tag>
and thendocker push <image>:<tag>
.In the
/conf/helmfile.d/
folder in your kiosk environment, add a new helmfile following the convention02##.custom-consumer.yaml
. The text for the helmfile can be copied from0250.caliban-consumer.yaml
as shown below. Then make the following changes to customize the helmfile to your consumer.Change
releases.name
toconsumer_name
Change
releases.values.image.repository
andreleases.values.image.tag
Change
releases.values.nameOverride
toconsumer_name
Change
releases.values.env.QUEUE
toqueue_name
Change
releases.values.env.CONSUMER_TYPE
toconsumer_type
Deploy your new helmfile to the cluster with:
helmfile -l name=my-new-consumer sync
Autoscaling custom consumers¶
Kubernetes scales each consumer using a Horizonal Pod Autoscaler (HPA).
Each HPA is configured in /conf/addons/hpa.yaml.
The HPA reads a consumer-specific custom metric, defined in /conf/helmfile.d/0600.prometheus-operator.yaml.
Each custom metric maximizes the work being done by balancing the amount of work left in the consumer’s Redis queue (made available by the prometheus-redis-exporter
) and the current GPU utilization.
Every job may have its own scaling requirements, and custom metrics can be tweaked to meet those requirements.
For example, the segmentation_consumer_key_ratio
in /conf/helmfile.d/0600.prometheus-operator.yaml demonstrates a more complex metric that tries to balance the ratio of TensorFlow Servers and consumers to throttle the requests-per-second.
To effectively scale your new consumer, some small edits will be needed in the following files:
/conf/helmfile.d/02##.custom-consumer.yaml
/conf/helmfile.d/0110.prometheus-redis-exporter.yaml
Within the
data.script
section of theprometheus-redis-exporter-script
ConfigMap, modify the sectionAll Queues to Monitor
to include the new queue (queue_name
).-- All Queues to Monitor: local queues = {} queues[#queues+1] = "segmentation" queues[#queues+1] = "caliban" queues[#queues+1] = "Your New QUEUE" for _,queue in ipairs(queues) do ...
/conf/helmfile.d/0600.prometheus-operator.yaml
Add a new
record
under- name: custom-redis-metrics
. In the example below, make the following modifications.Line 1: replace
caliban
withconsumer_type
Line 3: replace
caliban
withqueue_name
Line 12: replace
caliban
withconsumer_type
1 2 3 4 5 6 7 8 9 10 11 12
- record: caliban_consumer_key_ratio expr: |- avg_over_time(redis_script_value{key="caliban_image_keys"}[15s]) / on() ( avg_over_time(kube_deployment_spec_replicas{deployment="caliban-consumer"}[15s]) + 1 ) labels: namespace: deepcell service: caliban-scaling-service
/conf/helmfile.d/02##.custom-consumer.yaml
Finally, in the new consumer’s helmfile, add the new metric to the
hpa
block.Change
metadata.name
andspec.scaleTargetRef.name
toconsumer_name
Change
spec.metrics.object.metricName
andspec.metrics.object.target.name
toconsumer_type
1 2 3 4 5 6 7 8 9 10 11 12 13
hpa: enabled: true minReplicas: 1 maxReplicas: 50 metrics: - type: Object object: metricName: caliban_consumer_key_ratio target: apiVersion: v1 kind: Namespace name: caliban_consumer_key_ratio targetValue: 1
Connecting custom consumers with the Kiosk¶
A number of Kiosk components will need the new queue name in order to fully integrate the new job.
/conf/helmfile.d/0300.frontend.yaml
In the kiosk-frontend helmfile (/conf/helmfile.d/0300.frontend.yaml), add or modify the
env
variableJOB_TYPES
and replace withconsumer_type
.env: JOB_TYPES: "segmentation,caliban,<new job name>"
/conf/helmfile.d/0220.redis-janitor.yaml
The kiosk-redis-janitor monitors queues in an
env
variableQUEUES
for stalled jobs, and restarts them.consumer_type
must be added here as well.env: QUEUES: "segmentation,caliban,<new job name>"
/conf/helmfile.d/0210.autoscaler.yaml
The kiosk-autoscaler also has an
env
variableQUEUES
which it uses to determine whether a GPU must be activated. Addconsumer_type
to this variable too.env: QUEUES: "segmentation,caliban,<new job name>"
You will need to sync your helmfile in order to update your frontend website to reflect the change to the helmfile. Please run the following:
helm delete frontend; helmfile -l name=frontend sync
helm delete redis-janitor; helmfile -l name=redis-janitor sync
helm delete autoscaler; helmfile -l name=autoscaler sync
In a few minutes the Kiosk will be ready to process the new job type.