RStudio User Guide
Launch RStudio on HyperAI
This tutorial will introduce you to getting started with RStudio for high-performance computing on the HyperAI platform.
Create Container
- Click "High Performance Computing" on the left sidebar, create a new container
- Select the required computing power
- Select the desired payment method under "Billing Method"
- Select the required software under "Select Image", using "rstudio" as an example here
- Enter a valid container name under "Container Name"
- Select the data repository to bind under "Data Binding", or skip if there is no data to bind
- Click "Execute"
- Wait for the container to allocate resources. Once the status changes to "Running", click the link under "API Address".
Username and Password
You need to enter username and password before accessing the RStudio Server page.
- Username: rstudio
- Password: rstudio
Enter RStudio Working Directory
This tutorial uses simulated data from a commonly used depression scale in psychology for analysis demonstration. Get sample dataset.
1. Enter RStudio Server Page
Enter the RStudio Server page, which is the same as locally installed RStudio, with the only difference being the "Working Directory". The directory can be seen in the lower right corner of the page. Enter the following command to view the current working directory: /home/rstudio
getwd()2. Change Working Directory
For convenient data analysis, enter the following code to change the RStudio current working directory to /home
setwd("~/home")Create a data folder and output folder under the home directory, storing raw data, output results, and source code files all in the home folder, as shown below.
3. Data Preparation
3.1 Upload Data
- Method 1: Upload to your dataset in advance, directly input when launching the container. For details, refer to Computing Container Data Binding
- Method 2: Click "upload" to upload the prepared data files to the current working directory, as shown below
3.2 Reading Data
Use the readxl function to read the second sheet from the prepared phq.xlsx file.
library(readxl)
df <- read_excel("~/home/data/PHQ.xlsx",1)After reading is complete, run the following code to read the first 6 rows.
head(df)Perform preliminary data preprocessing and check:
- Data types;
- Categorization and factorization;
- Whether there are missing values.
Categorization and factorization:
factor(df$gender,ordered = TRUE)
factor(df$grade,ordered = TRUE)Data types:
str(df) #Confirm the data type of each columnCheck for missing values:
sum(is.na(df))#Check if there are missing values
na.omit(df)#If there are missing values, delete them4. Data Analysis
4.1 Calculate Total Scale Score
Enter the following command to use the apply function to sum columns 4-12 by row.
df$phq <- apply(df[c(4:12)],1,sum)
head(df)4.2 Use the cut Function for Result Level Classification
After obtaining the calculation results, the scores need to be classified. The PHQ scale classification standard here is: 1-4 points normal, 5-9 points mild depression, 10-14 points moderate depression, 15-19 points moderately severe depression, 20-27 points severe depression. Enter the following command, and the model will classify according to the standard.
df$level <- cut(df$phq,c(0,4,9,14,19,27),labels = c("Normal","Mild","Moderate","Moderately Severe","Severe"))
df4.3 Use the psych Package for Descriptive Statistics
After obtaining the classification results, descriptive statistics need to be performed on the data. First, you need to install "psych". Enter the following command to install.
library(psych)After installation is complete, enter the following command to load "psych".
phqdescri <- psych::describe(df)
phqdescri4.4 Quantitative Summary Analysis of Different Score Levels
Use table for overall quantitative analysis and summary. Here you can use the describeBy function in the psych package to summarize the quantity distribution of different levels by grade and gender.
First, enter the following command to summarize the quantity of different score levels.
levelphq <- table(df$level) #Quantity of each level
levelphqThen enter the following commands respectively to summarize the quantity distribution under different levels by gender and grade.
genderlevel <- df %>
  subset(select=c(gender,level)) %>
  table() %>addmargins() #Distribution of depression levels by different genders
genderlevelgradelevel <- df %>
  subset(select=c(grade,level)) %>
  table() %>addmargins() #Distribution of depression levels by different grades
gradelevel4.5 Calculate Score Means Using the describeBy Function from the psych Package
Enter the following command to start calculating:
genderDescri <- psych::describeBy(df[c("phq")],
                  list(df$gender))# Score differences by gender
genderDescrigenderDescri <- psych::describeBy(df[c("phq")],
                  list(df$gender))# Score differences by gender
genderDescri4.6 Conduct Reliability Analysis Using the alpha Function from the psych Package
# Reliability analysis
library(psych)
phqr <- alpha(df[,c(4:12)])
phqr4.7 Calculate Scale Reliability Using Item-Total Correlation
The "corr" function in the "psych" package can be used to perform item-total correlation reliability analysis.
rr <- corr.test(df[,c(4:12)],df$phq)
rr$r # Extract correlation coefficientsrr$p # Extract p-valuesresultphq <- round(rr$p,3)# Save results, set decimal places to 3
colnames(resultphq) <- c("phq")# Modify column name in results to phq
resultphqSaving Results
To save results, you need to use the "writexl" package. Run the following command to install it.
install.packages("writexl")After installation is complete, enter the following command to load it.
library(writexl)In the above analysis, each step saves the analysis results with a designated name. Then create a list of these named objects. Use the sink function to create an "output" folder in the directory. Save the analysis results to this folder.
result_list <- list(df,phqdescri,genderDescri,gradeDescri,levelphq,genderlevel,gradelevel,phqr,resultphq)
sink("~/home/output/output.txt")
print(result_list)
sink()Finally, you can return to the console interface to view the output file.