Tutorial: Creating a custom job¶

Rationale¶

In the kubernetes environment created by the kiosk, the task of processing images is coordinated by the redis-consumer. The number of consumers at work in any point in time is automatically scaled to match the number of images waiting in a work queue since each redis-consumer can only process one image at a time. Ultimately the redis-consumer is responsible for sending data to tf-serving containers to retrieve model predictions, but it also handles any pre- and post-processing steps that are required by a particular model.

Currently, deepcell.org supports a cell tracking feature which is facilitated by the caliban-consumer, which handles the multi-step process of cell tracking:

Send each frame of the dataset for segmentation. Frames are processed in parallel utilizing scalability and drastically reducing processing time.
Retrieve model predictions and run post-processing to generate cell segmentation masks
Send cell segmentation masks for cell tracking predictions
Compile final tracking results and post for download

New data processing pipelines can be implemented by writing a custom consumer. The model can be exported for tf-serving using export_model().

The following variables will be used throughout the setup of the custom consumer. Pick out names that are appropriate for your consumer.

queue_name¶: Specifies the queue name that will be used to identify jobs for the Custom Consumers, e.g. 'track'

consumer_name¶: Name of custom consumer, e.g. 'caliban-consumer'

consumer_type¶: Name of consumer job, e.g. 'caliban'

Designing a custom consumer¶

For guidance on the changes that need to be made to kiosk-redis-consumer, please see Custom Consumers

Deploying a custom consumer¶

The DeepCell Kiosk uses helm and helmfile to coordinate Docker containers. This allows the Custom Consumers to be easily extended by setting up a new docker image with your custom consumer.

If you do not already have an account on Docker Hub. Sign in to docker in your local environment using docker login.
From the root of the kiosk-redis-consumer folder, run docker build <image>:<tag> and then docker push <image>:<tag>.

In the /conf/helmfile.d/ folder in your kiosk environment, add a new helmfile following the convention 02##.custom-consumer.yaml. The text for the helmfile can be copied from 0250.caliban-consumer.yaml as shown below. Then make the following changes to customize the helmfile to your consumer.

Change releases.name to consumer_name
Change releases.values.image.repository and releases.values.image.tag
Change releases.values.nameOverride to consumer_name
Change releases.values.env.QUEUE to queue_name
Change releases.values.env.CONSUMER_TYPE to consumer_type

+ Show/Hide example helmfile

helmDefaults:
  wait: true
  timeout: 600
  force: true

releases:
#
# References:
#   - https://github.com/vanvalenlab/kiosk-console/tree/master/conf/charts/redis-consumer
#
- name: caliban-consumer
  namespace: deepcell
  labels:
    chart: redis-consumer
    component: deepcell
    namespace: deepcell
    vendor: vanvalenlab
    default: true
  chart: '{{ env "CHARTS_PATH" | default "/conf/charts" }}/redis-consumer'
  version: 0.1.0
  values:
    - replicas: 1

      image:
        repository: vanvalenlab/kiosk-redis-consumer
        tag: 0.5.1

      nameOverride: caliban-consumer

      resources:
        requests:
          cpu: 300m
          memory: 256Mi
        # limits:
        #   cpu: 100m
        #   memory: 1024Mi

      tolerations:
        - key: consumer
          operator: Exists
          effect: NoSchedule

      nodeSelector:
        consumer: "yes"

      hpa:
        enabled: true
        minReplicas: 1
        maxReplicas: 50
        metrics:
        - type: Object
          object:
            metricName: caliban_consumer_key_ratio
            target:
              apiVersion: v1
              kind: Namespace
              name: caliban_consumer_key_ratio
            targetValue: 1

      env:
        DEBUG: "true"
        INTERVAL: 1
        QUEUE: "caliban"
        CONSUMER_TYPE: "caliban"
        EMPTY_QUEUE_TIMEOUT: 5
        GRPC_TIMEOUT: 20
        GRPC_BACKOFF: 3

        REDIS_HOST: "redis"
        REDIS_PORT: 26379
        REDIS_TIMEOUT: 3

        TF_HOST: "tf-serving"
        TF_PORT: 8500
        TF_TENSOR_NAME: "image"
        TF_TENSOR_DTYPE: "DT_FLOAT"

        AWS_REGION: '{{ env "AWS_REGION" | default "us-east-1" }}'
        CLOUD_PROVIDER: '{{ env "CLOUD_PROVIDER" | default "aws" }}'
        GKE_COMPUTE_ZONE: '{{ env "GKE_COMPUTE_ZONE" | default "us-west1-b" }}'

        NUCLEAR_MODEL: "NuclearSegmentation:0"
        NUCLEAR_POSTPROCESS: "deep_watershed"

        PHASE_MODEL: "PhaseCytoSegmentation:0"
        PHASE_POSTPROCESS: "deep_watershed"

        CYTOPLASM_MODEL:   "FluoCytoSegmentation:0"
        CYTOPLASM_POSTPROCESS: "deep_watershed"

        LABEL_DETECT_ENABLED: "true"
        LABEL_DETECT_MODEL: "LabelDetection:0"

        SCALE_DETECT_ENABLED: "true"
        SCALE_DETECT_MODEL: "ScaleDetection:0"

        DRIFT_CORRECT_ENABLED: "false"
        NORMALIZE_TRACKING: "true"

        TRACKING_MODEL: "tracking_model_benchmarking_757_step5_20epoch_80split_9tl:1"
        TRACKING_SEGMENT_MODEL: "NuclearSegmentation:0"
        TRACKING_POSTPROCESS_FUNCTION: "deep_watershed"

      secrets:
        AWS_ACCESS_KEY_ID: '{{ env "AWS_ACCESS_KEY_ID" | default "NA" }}'
        AWS_SECRET_ACCESS_KEY: '{{ env "AWS_SECRET_ACCESS_KEY" | default "NA" }}'
        AWS_S3_BUCKET: '{{ env "AWS_S3_BUCKET" | default "NA" }}'
        GKE_BUCKET: '{{ env "GKE_BUCKET" | default "NA" }}'

Deploy your new helmfile to the cluster with:

helmfile -l name=my-new-consumer sync

Autoscaling custom consumers¶

Kubernetes scales each consumer using a Horizonal Pod Autoscaler (HPA). Each HPA is configured in /conf/addons/hpa.yaml. The HPA reads a consumer-specific custom metric, defined in /conf/helmfile.d/0600.prometheus-operator.yaml. Each custom metric maximizes the work being done by balancing the amount of work left in the consumer’s Redis queue (made available by the prometheus-redis-exporter) and the current GPU utilization.

Every job may have its own scaling requirements, and custom metrics can be tweaked to meet those requirements. For example, the segmentation_consumer_key_ratio in /conf/helmfile.d/0600.prometheus-operator.yaml demonstrates a more complex metric that tries to balance the ratio of TensorFlow Servers and consumers to throttle the requests-per-second.

To effectively scale your new consumer, some small edits will be needed in the following files:

/conf/helmfile.d/0110.prometheus-redis-exporter.yaml
/conf/helmfile.d/0600.prometheus-operator.yaml
/conf/helmfile.d/02##.custom-consumer.yaml

/conf/helmfile.d/0110.prometheus-redis-exporter.yaml

Within the data.script section of the prometheus-redis-exporter-script ConfigMap, modify the section All Queues to Monitor to include the new queue (queue_name).

-- All Queues to Monitor:
local queues = {}

queues[#queues+1] = "segmentation"
queues[#queues+1] = "caliban"
queues[#queues+1] = "Your New QUEUE"

for _,queue in ipairs(queues) do
    ...

/conf/helmfile.d/0600.prometheus-operator.yaml

Add a new record under - name: custom-redis-metrics. In the example below, make the following modifications.

Line 1: replace caliban with consumer_type
Line 3: replace caliban with queue_name
Line 12: replace caliban with consumer_type

- record: caliban_consumer_key_ratio
  expr: |-
    avg_over_time(redis_script_value{key="caliban_image_keys"}[15s])
    / on()
    (
        avg_over_time(kube_deployment_spec_replicas{deployment="caliban-consumer"}[15s])
        +
        1
    )
  labels:
    namespace: deepcell
    service: caliban-scaling-service

/conf/helmfile.d/02##.custom-consumer.yaml

Finally, in the new consumer’s helmfile, add the new metric to the hpa block.

Change metadata.name and spec.scaleTargetRef.name to consumer_name
Change spec.metrics.object.metricName and spec.metrics.object.target.name to consumer_type

hpa:
enabled: true
minReplicas: 1
maxReplicas: 50
metrics:
- type: Object
  object:
    metricName: caliban_consumer_key_ratio
    target:
      apiVersion: v1
      kind: Namespace
      name: caliban_consumer_key_ratio
    targetValue: 1

Connecting custom consumers with the Kiosk¶

A number of Kiosk components will need the new queue name in order to fully integrate the new job.

/conf/helmfile.d/0300.frontend.yaml

In the kiosk-frontend helmfile (/conf/helmfile.d/0300.frontend.yaml), add or modify the env variable JOB_TYPES and replace with consumer_type.
```
env:
    JOB_TYPES: "segmentation,caliban,<new job name>"
```
/conf/helmfile.d/0220.redis-janitor.yaml

The kiosk-redis-janitor monitors queues in an env variable QUEUES for stalled jobs, and restarts them. consumer_type must be added here as well.
```
env:
    QUEUES: "segmentation,caliban,<new job name>"
```
/conf/helmfile.d/0210.autoscaler.yaml

The kiosk-autoscaler also has an env variable QUEUES which it uses to determine whether a GPU must be activated. Add consumer_type to this variable too.
```
env:
    QUEUES: "segmentation,caliban,<new job name>"
```

You will need to sync your helmfile in order to update your frontend website to reflect the change to the helmfile. Please run the following:

helm delete frontend; helmfile -l name=frontend sync
helm delete redis-janitor; helmfile -l name=redis-janitor sync
helm delete autoscaler; helmfile -l name=autoscaler sync

In a few minutes the Kiosk will be ready to process the new job type.