HyperAIHyperAI

HyperAI Recognizing Handwritten Digits

If you are familiar with machine learning, especially the deep learning methods that have become very popular in recent years, you have likely heard of the MNIST dataset. This dataset comes from the National Institute of Standards and Technology (NIST). The training set consists of handwritten digits from 250 different people, and the test set also contains handwritten digit data in the same proportion.

Here we will use this dataset to introduce image classification using PyTorch / TensorFlow in HyperAI. This will involve HyperAI's dataset binding, model training, and model usage.

PyTorch

Get the Code

Download the sample code from GitHub:

git clone https://github.com/signcl/openbayes-mnist-example.git
cd openbayes-mnist-example/pytorch

After switching to the downloaded code directory, you will see the following code structure:

.
├── openbayes.yaml
└── train.py

Where:

  1. .py contains the code for executing machine learning model training and performing inference using the model
  2. openbayes.yaml contains the configuration for executing a "Python Script Execution", for more information see openbayes cli Configuration File

Create Your First Task

:::caution Note The process introduced here is for uploading a code archive through the web interface, but this is no longer our recommended best practice for creating "Python Script Execution". A better approach can be found at bayes Command Line Tool Getting Started. :::

After preparing the dataset, click "New Compute Container" in the left navigation bar on the page to create a Python script execution task, using train.py to train a PyTorch handwritten digit recognition model.

On the "New Container" page, select Python Script Execution as the access method, and upload train.py from the directory. In the execution command field, enter the command:

python train.py

The parameter parsing in train.py uses Python's built-in argparse library. For more information, see argparse — Parser for command-line options, arguments and sub-commands.

Select CPU for compute resources (if other GPU types are available, GPU is recommended as it will be much faster), then select the "pytorch-1.8" image,

After submitting the task, wait about 15 seconds for the task to start executing. On the container page, you can see the execution status displayed in the logs.

TensorFlow

Get the Code

Download the sample code from GitHub:

git clone https://github.com/signcl/openbayes-mnist-example.git
cd openbayes-mnist-example/tensorflow

Switch to the downloaded code directory and you will see the following code structure:

.
├── inference.py
├── openbayes.yaml
└── train.py

Where:

  1. .py contains code for executing machine learning model training and performing inference using the model
  2. openbayes.yaml contains configuration for executing a "Python Script". For more information, see openbayes cli configuration file

Create Your First Task

:::caution Note This article introduces the process of uploading a code archive via the web interface. However, this is no longer our recommended best practice for creating "Python Scripts". For a better approach, please refer to Getting Started with bayes Command Line Tool. :::

After the dataset is prepared, click "New Compute Container" in the left navigation bar to create a Python script execution task, using train.py to train a TensorFlow handwriting recognition model.

On the "New Container" page, select "Python Script Execution" and enter the following command in the execution command field:

python train.py -o /openbayes/home -e 50 -m model.h5 -l ./tf_dir

Where:

  1. train.py is the py file used for model training
  2. -o /openbayes/home specifies that the file directory to save to is /openbayes/home. In Python script execution mode, the system will save and upload results in the /openbayes/home directory, so all work directory results should be saved in this directory. File updates in other directories, including even the default . directory, will not be saved
  3. -e 50 specifies the epoch value for model training
  4. -m model.h5 specifies the name of the saved model. Combined with the -o /openbayes/home parameter, the final trained model will be stored at /openbayes/home/model.h5
  5. -l ./tf_dir defaults ./tf_dir to point to HyperAI as the directory for TensorBoard

Parameter parsing in train.py uses Python's built-in library argparse. For more information, see argparse — Parser for command-line options, arguments and sub-commands.

In the next step, select CPU for compute, choose "tensorflow-2.8" for the image, select "Python Script Execution" as the access method, and upload the train.py file from the directory.

After submission, wait about 15 seconds and the task will begin execution. The task startup time is usually related to the size of the bound dataset - the larger the required dataset, the longer the container execution preparation time. On the container page, you can see the execution status displayed in the logs.

Click "TensorBoard Visualization" to view the executing model information through TensorBoard.

View Execution Directory and Output Results

After execution is complete, click the "Working Directory" tab on the container page to see that the specified model has been created.

Use the Trained Model for Classification

After obtaining the trained model, we can use it to classify and predict data. On the completed execution page, click "Continue Execution" in the upper right corner to enter the new execution build page.

Here, bind the "Working Directory" from the previous execution to /openbayes/home. For more details about "Continue Execution", see Continue Execution for Compute Containers.

Modify the specified execution command:

python inference.py -m /openbayes/home/model.h5

Where -m /openbayes/home/model.h5 specifies the directory of the model to be loaded. After submission and execution is complete, you can see in the logs that 98% accuracy was achieved on the 10,000 test results.