RStudio User Guide

This tutorial will introduce you to getting started with RStudio for high-performance computing on the HyperAI platform.

Create Container

Click "High Performance Computing" on the left sidebar, create a new container
Select the required computing power
Select the desired payment method under "Billing Method"
Select the required software under "Select Image", using "rstudio" as an example here
Enter a valid container name under "Container Name"
Select the data repository to bind under "Data Binding", or skip if there is no data to bind
Click "Execute"
Wait for the container to allocate resources. Once the status changes to "Running", click the link under "API Address".

Username and Password

You need to enter username and password before accessing the RStudio Server page.

Username: rstudio
Password: rstudio

Enter RStudio Working Directory

This tutorial uses simulated data from a commonly used depression scale in psychology for analysis demonstration. Get sample dataset.

Enter the RStudio Server page, which is the same as locally installed RStudio, with the only difference being the "Working Directory". The directory can be seen in the lower right corner of the page. Enter the following command to view the current working directory: /home/rstudio

getwd()

2. Change Working Directory

For convenient data analysis, enter the following code to change the RStudio current working directory to /home

setwd("~/home")

Create a data folder and output folder under the home directory, storing raw data, output results, and source code files all in the home folder, as shown below.

3. Data Preparation

3.1 Upload Data

Get sample dataset

Method 1: Upload to your dataset in advance, directly input when launching the container. For details, refer to Computing Container Data Binding
Method 2: Click "upload" to upload the prepared data files to the current working directory, as shown below

3.2 Reading Data

Use the readxl function to read the second sheet from the prepared phq.xlsx file.

library(readxl)
df <- read_excel("~/home/data/PHQ.xlsx",1)

After reading is complete, run the following code to read the first 6 rows.

head(df)

Perform preliminary data preprocessing and check:

Data types;
Categorization and factorization;
Whether there are missing values.

Categorization and factorization:

factor(df$gender,ordered = TRUE)
factor(df$grade,ordered = TRUE)

Data types:

str(df) #Confirm the data type of each column

Check for missing values:

sum(is.na(df))#Check if there are missing values
na.omit(df)#If there are missing values, delete them

4. Data Analysis

4.1 Calculate Total Scale Score

Enter the following command to use the apply function to sum columns 4-12 by row.

df$phq <- apply(df[c(4:12)],1,sum)
head(df)

4.2 Use the cut Function for Result Level Classification

After obtaining the calculation results, the scores need to be classified. The PHQ scale classification standard here is: 1-4 points normal, 5-9 points mild depression, 10-14 points moderate depression, 15-19 points moderately severe depression, 20-27 points severe depression. Enter the following command, and the model will classify according to the standard.

df$level <- cut(df$phq,c(0,4,9,14,19,27),labels = c("Normal","Mild","Moderate","Moderately Severe","Severe"))
df

4.3 Use the psych Package for Descriptive Statistics

After obtaining the classification results, descriptive statistics need to be performed on the data. First, you need to install "psych". Enter the following command to install.

library(psych)

After installation is complete, enter the following command to load "psych".

phqdescri <- psych::describe(df)
phqdescri

4.4 Quantitative Summary Analysis of Different Score Levels

Use table for overall quantitative analysis and summary. Here you can use the describeBy function in the psych package to summarize the quantity distribution of different levels by grade and gender.

First, enter the following command to summarize the quantity of different score levels.

levelphq <- table(df$level) #Quantity of each level
levelphq

Then enter the following commands respectively to summarize the quantity distribution under different levels by gender and grade.

genderlevel <- df %>
  subset(select=c(gender,level)) %>
  table() %>addmargins() #Distribution of depression levels by different genders
genderlevel

gradelevel <- df %>
  subset(select=c(grade,level)) %>
  table() %>addmargins() #Distribution of depression levels by different grades
gradelevel

4.5 Calculate Score Means Using the describeBy Function from the psych Package

Enter the following command to start calculating:

genderDescri <- psych::describeBy(df[c("phq")],
                  list(df$gender))# Score differences by gender
genderDescri

genderDescri <- psych::describeBy(df[c("phq")],
                  list(df$gender))# Score differences by gender
genderDescri

4.6 Conduct Reliability Analysis Using the alpha Function from the psych Package

# Reliability analysis
library(psych)
phqr <- alpha(df[,c(4:12)])
phqr

4.7 Calculate Scale Reliability Using Item-Total Correlation

The "corr" function in the "psych" package can be used to perform item-total correlation reliability analysis.

rr <- corr.test(df[,c(4:12)],df$phq)
rr$r # Extract correlation coefficients

rr$p # Extract p-values

resultphq <- round(rr$p,3)# Save results, set decimal places to 3
colnames(resultphq) <- c("phq")# Modify column name in results to phq
resultphq

Saving Results

To save results, you need to use the "writexl" package. Run the following command to install it.

install.packages("writexl")

After installation is complete, enter the following command to load it.

library(writexl)

In the above analysis, each step saves the analysis results with a designated name. Then create a list of these named objects. Use the sink function to create an "output" folder in the directory. Save the analysis results to this folder.

result_list <- list(df,phqdescri,genderDescri,gradeDescri,levelphq,genderlevel,gradelevel,phqr,resultphq)
sink("~/home/output/output.txt")
print(result_list)
sink()

Finally, you can return to the console interface to view the output file.