HyperAIHyperAI

Super-Resolution Service from Scratch (Predictor Method)

:::warning The traditional predictor.py deployment method is no longer the best practice. It is recommended to use custom deployment methods. Please refer to the Quick Start Documentation for the recommended deployment approach. :::

The dependency installation described in the Dependency Management section will be executed when "Model Deployment" starts. To accelerate service startup, you can use a custom image with all dependencies pre-installed.

MMEditing is an open-source image and video editing library based on PyTorch, part of the OpenMMLab project. MMEditing can perform image inpainting, matting, super-resolution, and generation tasks. This tutorial uses super-resolution as an example to build a Serving service.

Walk Through the Process in Development Environment

First, create a new "Model Training" compute container, select a PyTorch image, and name it MMEditing.

After the Jupyter workspace starts, open a Terminal, clone the MMEditing repository, find the documentation link in MMEditing's README, and follow the documentation for installation.

The container already provides CUDA, PyTorch, and torchvision components, so you can start from step c.

Execute pip install --user -r requirements.txt. One package requires compilation, so it will take some time.

Finally, execute pip install --user -v . to install the mmediting library itself. Do not add the -e parameter (development mode) but install normally, as it will also be needed after the model goes online.

Then we found the pre-trained EDSR model in the documentation under Model Zoo/Restoration Models. Download the configuration and model files and upload them to the home directory of the compute container (you can use the upload button in the Jupyter workspace or drag them directly onto the file list).

After uploading, open a new Notebook in the home directory and execute:

import mmedit

If there are no errors, it means our environment is set up.

Next, upload a low-resolution image for testing. We'll call it low-resolution.png.

In the MMEditing repository at demo/restoration_demo.py, you can find sample code for super-resolution.

We'll rewrite it and experiment with it in a Notebook.

import mmedit, mmcv, torch
from mmedit.apis import init_model, restoration_inference
from mmedit.core import tensor2img

model = init_model(
    "edsr_x2c64b16_g1_300k_div2k.py",
    "edsr_x2c64b16_1x16_300k_div2k_20200604-19fe95ea.pth",
    device=torch.device('cuda', 0)
)

output = restoration_inference(model, "low-resolution.png")
output = tensor2img(output)
mmcv.imwrite(output, "high-resolution.png")

import PIL
low = PIL.Image.open('low-resolution.png')
high = PIL.Image.open('high-resolution.png')
display(low, high)

At this point, we have successfully performed super-resolution processing on an image.

Writing predictor.py for the Serving Service

The specific method for writing predictor.py is introduced in Serving Service Development, so we won't explain it in detail here.

import openbayes_serving as serv

import mmedit, mmcv, torch
import cv2
from mmedit.apis import init_model, restoration_inference
from mmedit.core import tensor2img

import tempfile


class Predictor:
    def __init__(self):
        self.model = init_model(
            "edsr_x2c64b16_g1_300k_div2k.py",
            "edsr_x2c64b16_1x16_300k_div2k_20200604-19fe95ea.pth",
            device=torch.device('cuda', 0)
        )

    def predict(self, data):
        # Here we specify that the input data is an image directly POSTed
        f = tempfile.NamedTemporaryFile()
        f.write(data)
        f.seek(0)
        output = restoration_inference(self.model, f.name)
        _, img = cv2.imencode('.png', output)
        return img

if __name__ == '__main__':
    serv.run(Predictor)

You can see that it's basically the code we wrote in the Notebook earlier, filled into a fixed framework. Save the file as predictor.py and open a new Terminal to run it.

Then test it in the Notebook by POSTing to the local access address:

import requests

with open('low-resolution.png', 'rb') as f:
    img = f.read()

resp = requests.post('http://0.0.0.0:8080', data=img)

However, going back to the predictor.py Terminal window, we find a very strange error.

cv2.imencode fails, most likely because the output parameter is not quite right. Following the prompt, we open the debug link to observe the state of the output variable.

We find that output is a PyTorch Tensor, but OpenCV doesn't recognize Tensors—it only knows numpy arrays.

Re-examining the code, we discover that we missed the line output = tensor2img(output) (refer to the previous code). Let's first experiment in the debug window.

We find it can now run successfully, so we add this line back and run python predictor.py again, then test once more.

Still an error, but this time it's a very obvious one—we need to convert the resulting numpy array object to bytes.

Change the final return img to return img.tobytes() and that should do it. Test again, and the result is successfully returned.

Finally, Deploy!

Development is complete, and we can clean up unnecessary items. The final result looks like this:

Stop the compute container, wait for synchronization to complete, then find "Models" in the left sidebar under "Data Repository" and click "Create New Model".

Then return to the stopped compute container and click "Copy Current Directory to Data Repository".

The processed model should look like this:

Then find "Model Deployment" under "Compute Containers", click "Create New Deployment", select the same image as used during development, and choose a GPU compute resource.

Bind the model you just prepared and click "Deploy"

Wait for the model to start up. When you see the following screen, it means success

Then get the API address from the "Overview" section of the model deployment,

Try it somewhere else

At this point, the super-resolution service has been successfully deployed online.

Additionally, let's do a simple performance test

Note that this is not a case designed to showcase performance. To reduce cognitive load, the tutorial uses the most straightforward implementation method, so the service performance is not optimal. This section is mainly used to demonstrate the usage of "Request Statistics".

We wrap the final requests.post in while True: to continuously send requests to the service, and then observe in "Request Statistics":

We observe that processing one request takes an average (avg) of 109ms. This data is observed from the backend and does not include the connection creation, data transmission, and data return processes of sending the request. The maximum (max) request time is not much different from the 50% percentile (median), so the actual processing time for each request is very stable.

The interval between each data point is 10s, and each data point captures 61 requests, so the QPS is 6.1. The average processing time for each request is 10s/61 = 164ms, which is the actual request time perceived by the client.

Then open several Notebooks and send requests to the service in parallel, and observe again.

Then we find that the QPS has become 20.5, but the higher-end request times have already diverged from the 50% percentile. This is a phenomenon that occurs when the service is under high pressure, and these metrics will help you evaluate the actual service quality.