Tutorial: Creating a custom job

View on Read the Docs

Rationale

In the kubernetes environment created by the kiosk, the task of processing images is coordinated by the redis-consumer. The number of consumers at work in any point in time is automatically scaled to match the number of images waiting in a work queue since each redis-consumer can only process one image at a time. Ultimately the redis-consumer is responsible for sending data to tf-serving containers to retrieve model predictions, but it also handles any pre- and post-processing steps that are required by a particular model.

Currently, deepcell.org supports a cell tracking feature which is facilitated by the caliban-consumer, which handles the multi-step process of cell tracking:

  1. Send each frame of the dataset for segmentation. Frames are processed in parallel utilizing scalability and drastically reducing processing time.

  2. Retrieve model predictions and run post-processing to generate cell segmentation masks

  3. Send cell segmentation masks for cell tracking predictions

  4. Compile final tracking results and post for download

New data processing pipelines can be implemented by writing a custom consumer. The model can be exported for tf-serving using export_model().

The following variables will be used throughout the setup of the custom consumer. Pick out names that are appropriate for your consumer.

queue_name

Specifies the queue name that will be used to identify jobs for the Custom Consumers, e.g. 'track'

consumer_name

Name of custom consumer, e.g. 'caliban-consumer'

consumer_type

Name of consumer job, e.g. 'caliban'

Designing a custom consumer

For guidance on the changes that need to be made to kiosk-redis-consumer, please see Custom Consumers

Deploying a custom consumer

The DeepCell Kiosk uses helm and helmfile to coordinate Docker containers. This allows the Custom Consumers to be easily extended by setting up a new docker image with your custom consumer.

  1. If you do not already have an account on Docker Hub. Sign in to docker in your local environment using docker login.

  2. From the root of the kiosk-redis-consumer folder, run docker build <image>:<tag> and then docker push <image>:<tag>.

  3. In the /conf/helmfile.d/ folder in your kiosk environment, add a new helmfile following the convention 02##.custom-consumer.yaml. The text for the helmfile can be copied from 0250.caliban-consumer.yaml as shown below. Then make the following changes to customize the helmfile to your consumer.

    • Change releases.name to consumer_name

    • Change releases.values.image.repository and releases.values.image.tag

    • Change releases.values.nameOverride to consumer_name

    • Change releases.values.env.QUEUE to queue_name

    • Change releases.values.env.CONSUMER_TYPE to consumer_type

    + Show/Hide example helmfile
  4. Deploy your new helmfile to the cluster with:

helmfile -l name=my-new-consumer sync

Autoscaling custom consumers

Kubernetes scales each consumer using a Horizonal Pod Autoscaler (HPA). Each HPA is configured in /conf/addons/hpa.yaml. The HPA reads a consumer-specific custom metric, defined in /conf/helmfile.d/0600.prometheus-operator.yaml. Each custom metric maximizes the work being done by balancing the amount of work left in the consumer’s Redis queue (made available by the prometheus-redis-exporter) and the current GPU utilization.

Every job may have its own scaling requirements, and custom metrics can be tweaked to meet those requirements. For example, the segmentation_consumer_key_ratio in /conf/helmfile.d/0600.prometheus-operator.yaml demonstrates a more complex metric that tries to balance the ratio of TensorFlow Servers and consumers to throttle the requests-per-second.

To effectively scale your new consumer, some small edits will be needed in the following files:

  1. /conf/helmfile.d/0110.prometheus-redis-exporter.yaml

    Within the data.script section of the prometheus-redis-exporter-script ConfigMap, modify the section All Queues to Monitor to include the new queue (queue_name).

    -- All Queues to Monitor:
    local queues = {}
    
    queues[#queues+1] = "segmentation"
    queues[#queues+1] = "caliban"
    queues[#queues+1] = "Your New QUEUE"
    
    for _,queue in ipairs(queues) do
        ...
    
  2. /conf/helmfile.d/0600.prometheus-operator.yaml

    Add a new record under - name: custom-redis-metrics. In the example below, make the following modifications.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    - record: caliban_consumer_key_ratio
      expr: |-
        avg_over_time(redis_script_value{key="caliban_image_keys"}[15s])
        / on()
        (
            avg_over_time(kube_deployment_spec_replicas{deployment="caliban-consumer"}[15s])
            +
            1
        )
      labels:
        namespace: deepcell
        service: caliban-scaling-service
    
  3. /conf/helmfile.d/02##.custom-consumer.yaml

    Finally, in the new consumer’s helmfile, add the new metric to the hpa block.

    • Change metadata.name and spec.scaleTargetRef.name to consumer_name

    • Change spec.metrics.object.metricName and spec.metrics.object.target.name to consumer_type

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    hpa:
    enabled: true
    minReplicas: 1
    maxReplicas: 50
    metrics:
    - type: Object
      object:
        metricName: caliban_consumer_key_ratio
        target:
          apiVersion: v1
          kind: Namespace
          name: caliban_consumer_key_ratio
        targetValue: 1
    

Connecting custom consumers with the Kiosk

A number of Kiosk components will need the new queue name in order to fully integrate the new job.

  1. /conf/helmfile.d/0300.frontend.yaml

    In the kiosk-frontend helmfile (/conf/helmfile.d/0300.frontend.yaml), add or modify the env variable JOB_TYPES and replace with consumer_type.

    env:
        JOB_TYPES: "segmentation,caliban,<new job name>"
    
  2. /conf/helmfile.d/0220.redis-janitor.yaml

    The kiosk-redis-janitor monitors queues in an env variable QUEUES for stalled jobs, and restarts them. consumer_type must be added here as well.

    env:
        QUEUES: "segmentation,caliban,<new job name>"
    
  3. /conf/helmfile.d/0210.autoscaler.yaml

    The kiosk-autoscaler also has an env variable QUEUES which it uses to determine whether a GPU must be activated. Add consumer_type to this variable too.

    env:
        QUEUES: "segmentation,caliban,<new job name>"
    

You will need to sync your helmfile in order to update your frontend website to reflect the change to the helmfile. Please run the following:

helm delete frontend; helmfile -l name=frontend sync
helm delete redis-janitor; helmfile -l name=redis-janitor sync
helm delete autoscaler; helmfile -l name=autoscaler sync

In a few minutes the Kiosk will be ready to process the new job type.