Serving Development (Predictor Method)
Introduction to currently supported Predictors and how to write them
:::warning
The traditional predictor.py deployment method is no longer the best practice. It is recommended to use custom deployment methods. Please refer to the Quick Start Documentation for the recommended deployment approach.
:::
Currently, HyperAI model deployment supports two deployment modes:
- Standard predictor.pydeployment method
- Fully custom deployment that bypasses the HyperAI provided framework
The custom method is designed for advanced users who need fine-grained control over deployment services. If you're unsure whether you need the custom method, then you don't need it, and it's recommended to proceed with the standard method.
Standard predictor.py Method
Dependencies
In addition to libraries used in your business logic, you need an extra dependency on openbayes-serving. Please ensure this library is updated to the latest version.
Directory Structure
Model deployment must include two parts:
- predictor.pyand its dependent files, used for handling model requests
- Model files
The interface of predictor.py will be introduced one by one below. Detailed information about exporting model files can be found in Model Exporting.
Any files that need to be referenced in predictor.py must be placed in the same directory or its subdirectories. For example, if you need a classes.json file to store classification information, you can access it in the predictor.py file as follows:
import json
class Predictor:
    def __init__(self):
        with open('classes.json', 'r') as f:
            values = json.load(f)
        self.values = values
    ...A complete project example can be found in pytorch/image-classifier-resnet50.
Predictor
Template
The Predictor template is as follows:
import openbayes_serving as serv
class Predictor:
    def __init__(self):
        """
        Responsible for loading the corresponding model and initializing metadata
        """
        pass
    def predict(self, json):
        """
        Called on every request
        Accepts the content of the HTTP request (`json`)
        Performs necessary preprocessing, predicts results,
        and finally post-processes and returns the results to the caller
        Args:
            json: Request data
        Returns:
            Prediction results
        """
        pass
```python
if __name__ == '__main__':  # If predictor.py is executed directly, rather than imported by other files
    serv.run(Predictor)  # Start providing serviceThe json parameter will be parsed according to the Content-Type header of the HTTP request:
- For Content-Type: application/json,jsonwill be parsed as a dictionary (Dict) in JSON format
- For Content-Type: application/msgpack, or other MessagePack type aliases,jsonwill be handled in the same way as JSON format (parsed as a dictionary)
Example
Here is the predictor.py file from pytorch/object-detector:
from io import BytesIO
import requests
import torch
from PIL import Image
from torchvision import models
from torchvision import transforms
import openbayes_serving as serv
class Predictor:
    def __init__(self):
        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        print(f"using device: {self.device}")
        model = models.detection.fasterrcnn_resnet50_fpn(pretrained=False, pretrained_backbone=False)
        model.load_state_dict(torch.load("fasterrcnn_resnet50_fpn_coco-258fb6c6.pth"))
        model = model.to(self.device)
        model.eval()
        self.preprocess = transforms.Compose([transforms.ToTensor()])
        with open("coco_labels.txt") as f:
            self.coco_labels = f.read().splitlines()
        self.model = model
```python
def predict(self, json):
    # Only the json parameter is used, its content is in Dict format, which can be accessed directly through json[key]
    threshold = float(json["threshold"])
    image = requests.get(json["url"]).content
    img_pil = Image.open(BytesIO(image))
    img_tensor = self.preprocess(img_pil).to(self.device)
    img_tensor.unsqueeze_(0)
    with torch.no_grad():
        pred = self.model(img_tensor)
    predicted_class = [self.coco_labels[i] for i in pred[0]["labels"].cpu().tolist()]
    predicted_boxes = [
        [(i[0], i[1]), (i[2], i[3])] for i in pred[0]["boxes"].detach().cpu().tolist()
    ]
    predicted_score = pred[0]["scores"].detach().cpu().tolist()
    predicted_t = [predicted_score.index(x) for x in predicted_score if x > threshold]
    if len(predicted_t) == 0:
        return [], []
    predicted_t = predicted_t[-1]
    predicted_boxes = predicted_boxes[: predicted_t + 1]
    predicted_class = predicted_class[: predicted_t + 1]
    return predicted_boxes, predicted_class
if __name__ == '__main__':
    serv.run(Predictor)Predictor Class Interface
The Predictor class does not need to inherit from other classes, but must provide at least two interfaces:
__init__
The __init__ method has a variable number of parameters with no order requirement, but is sensitive to parameter names.
Each parameter that appears represents a different functionality:
- onnx: Will search for- *.onnxfiles in the directory where- predictor.pyis located, load them, and pass them as parameters
Examples:
class PredictorExample1:
    def __init__(self):
        pass
    def predict(self, json):
        return {'result': 'It works!'}class PredictorExample2:
    def __init__(self, onnx):
        self.onnx = onnx
    def preprocess(self, v):
        ...
```python
def postprocess(self, v):
    ...
def predict(self, json):
    onnx = self.onnx
    m = self.preprocess(json['data'])
    result = self.postprocess(onnx.run(None, {'data': m}))
    return resultpredict
The predict function has a variable number of parameters with no order requirement, but is sensitive to parameter names.
Each parameter represents a different function:
- json: Parsed data, will parse JSON or MessagePack data into Python objects according to the- Content-Typevalue.
- payload: Same as- json, except that- jsonreturns a 400 error when encountering unparseable data, while- payloadprovides unparsed data directly.
- data: Unparsed POST data, always of- bytestype.
- params: HTTP GET parameters, a dict object.
- headers: HTTP headers, a dict object.
- request: Flask HTTP Request object, please refer to Flask documentation for usage.
Example:
class PredictorExample1:
    def __init__(self):
        pass
    def predict(self, json, params):
        return {'message': f'Param foo is {params.get("foo")}'}Return Values of the predict Function
The return result of the predict method can be:
- A JSON-serializable object, such as List, Dict, Tuple in Python
- A strtype string
- A bytestype object
- A flask.Responsetype object
Here are some examples:
def predict(self, json):
    return [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]def predict(self):
    return "class 1"def predict(self):
    # bytes type return
    array = np.random.randn(3, 3)
    response = pickle.dumps(array)
    return responsedef predict(self):
    from flask import Response
    data = b"class 1"
    response = Response(data, mimetype="text/plain")
    return responseFully Customized Approach
If a start.sh file exists in the deployed model, all the above preparation steps will be skipped and this file will be executed directly.
You need to complete all initialization and service startup tasks yourself.
The service needs to listen on port 80 and handle HTTP requests.