HyperAIHyperAI

Serving Development (Predictor Method)

Introduction to currently supported Predictors and how to write them

:::warning The traditional predictor.py deployment method is no longer the best practice. It is recommended to use custom deployment methods. Please refer to the Quick Start Documentation for the recommended deployment approach. :::

Currently, HyperAI model deployment supports two deployment modes:

  1. Standard predictor.py deployment method
  2. Fully custom deployment that bypasses the HyperAI provided framework

The custom method is designed for advanced users who need fine-grained control over deployment services. If you're unsure whether you need the custom method, then you don't need it, and it's recommended to proceed with the standard method.

Standard predictor.py Method

Dependencies

In addition to libraries used in your business logic, you need an extra dependency on openbayes-serving. Please ensure this library is updated to the latest version.

Directory Structure

Model deployment must include two parts:

  1. predictor.py and its dependent files, used for handling model requests
  2. Model files

The interface of predictor.py will be introduced one by one below. Detailed information about exporting model files can be found in Model Exporting.

Any files that need to be referenced in predictor.py must be placed in the same directory or its subdirectories. For example, if you need a classes.json file to store classification information, you can access it in the predictor.py file as follows:

import json

class Predictor:
    def __init__(self):
        with open('classes.json', 'r') as f:
            values = json.load(f)
        self.values = values

    ...

A complete project example can be found in pytorch/image-classifier-resnet50.

Predictor

Template

The Predictor template is as follows:

import openbayes_serving as serv


class Predictor:
    def __init__(self):
        """
        Responsible for loading the corresponding model and initializing metadata
        """
        pass

    def predict(self, json):
        """
        Called on every request
        Accepts the content of the HTTP request (`json`)
        Performs necessary preprocessing, predicts results,
        and finally post-processes and returns the results to the caller

        Args:
            json: Request data

        Returns:
            Prediction results
        """
        pass

```python
if __name__ == '__main__':  # If predictor.py is executed directly, rather than imported by other files
    serv.run(Predictor)  # Start providing service

The json parameter will be parsed according to the Content-Type header of the HTTP request:

  • For Content-Type: application/json, json will be parsed as a dictionary (Dict) in JSON format
  • For Content-Type: application/msgpack, or other MessagePack type aliases, json will be handled in the same way as JSON format (parsed as a dictionary)

Example

Here is the predictor.py file from pytorch/object-detector:

from io import BytesIO

import requests
import torch
from PIL import Image
from torchvision import models
from torchvision import transforms

import openbayes_serving as serv


class Predictor:
    def __init__(self):
        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        print(f"using device: {self.device}")

        model = models.detection.fasterrcnn_resnet50_fpn(pretrained=False, pretrained_backbone=False)
        model.load_state_dict(torch.load("fasterrcnn_resnet50_fpn_coco-258fb6c6.pth"))
        model = model.to(self.device)
        model.eval()

        self.preprocess = transforms.Compose([transforms.ToTensor()])

        with open("coco_labels.txt") as f:
            self.coco_labels = f.read().splitlines()

        self.model = model

```python
def predict(self, json):
    # Only the json parameter is used, its content is in Dict format, which can be accessed directly through json[key]
    threshold = float(json["threshold"])
    image = requests.get(json["url"]).content
    img_pil = Image.open(BytesIO(image))
    img_tensor = self.preprocess(img_pil).to(self.device)
    img_tensor.unsqueeze_(0)

    with torch.no_grad():
        pred = self.model(img_tensor)

    predicted_class = [self.coco_labels[i] for i in pred[0]["labels"].cpu().tolist()]
    predicted_boxes = [
        [(i[0], i[1]), (i[2], i[3])] for i in pred[0]["boxes"].detach().cpu().tolist()
    ]
    predicted_score = pred[0]["scores"].detach().cpu().tolist()
    predicted_t = [predicted_score.index(x) for x in predicted_score if x > threshold]
    if len(predicted_t) == 0:
        return [], []

    predicted_t = predicted_t[-1]
    predicted_boxes = predicted_boxes[: predicted_t + 1]
    predicted_class = predicted_class[: predicted_t + 1]
    return predicted_boxes, predicted_class


if __name__ == '__main__':
    serv.run(Predictor)

Predictor Class Interface

The Predictor class does not need to inherit from other classes, but must provide at least two interfaces:

__init__

The __init__ method has a variable number of parameters with no order requirement, but is sensitive to parameter names.

Each parameter that appears represents a different functionality:

  • onnx: Will search for *.onnx files in the directory where predictor.py is located, load them, and pass them as parameters

Examples:

class PredictorExample1:

    def __init__(self):
        pass

    def predict(self, json):
        return {'result': 'It works!'}
class PredictorExample2:

    def __init__(self, onnx):
        self.onnx = onnx

    def preprocess(self, v):
        ...

```python
def postprocess(self, v):
    ...

def predict(self, json):
    onnx = self.onnx
    m = self.preprocess(json['data'])
    result = self.postprocess(onnx.run(None, {'data': m}))
    return result
predict

The predict function has a variable number of parameters with no order requirement, but is sensitive to parameter names.

Each parameter represents a different function:

  • json: Parsed data, will parse JSON or MessagePack data into Python objects according to the Content-Type value.
  • payload: Same as json, except that json returns a 400 error when encountering unparseable data, while payload provides unparsed data directly.
  • data: Unparsed POST data, always of bytes type.
  • params: HTTP GET parameters, a dict object.
  • headers: HTTP headers, a dict object.
  • request: Flask HTTP Request object, please refer to Flask documentation for usage.

Example:

class PredictorExample1:

    def __init__(self):
        pass

    def predict(self, json, params):
        return {'message': f'Param foo is {params.get("foo")}'}

Return Values of the predict Function

The return result of the predict method can be:

  • A JSON-serializable object, such as List, Dict, Tuple in Python
  • A str type string
  • A bytes type object
  • A flask.Response type object

Here are some examples:

def predict(self, json):
    return [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
def predict(self):
    return "class 1"
def predict(self):
    # bytes type return
    array = np.random.randn(3, 3)
    response = pickle.dumps(array)
    return response
def predict(self):
    from flask import Response
    data = b"class 1"
    response = Response(data, mimetype="text/plain")
    return response

Fully Customized Approach

If a start.sh file exists in the deployed model, all the above preparation steps will be skipped and this file will be executed directly. You need to complete all initialization and service startup tasks yourself.

The service needs to listen on port 80 and handle HTTP requests.