I am using the mpg dataset included in the ggplot2 package. #load packages library(tidyverse) library(tidytext) #view data mpg ## Rows: 234 ## Columns: 11 ## $ manufacturer <chr> "audi", "audi", "audi", "audi", "audi", "audi", "audi", "… ## $ model <chr> "a4", "a4", "a4", "a4", "a4", "a4", "a4", "a4 quattro", "… ## $ displ <dbl> 1.8, 1.8, 2.0, 2.0, 2.8, 2.8, 3.1, 1.8, 1.8, 2.0, 2.0, 2.… ## $ year <int> 1999, 1999, 2008, 2008, 1999, 1999, 2008, 1999, 1999, 200… ## $ cyl <int> 4, 4, 4, 4, 6, 6, 6, 4, 4, 4, 4, 6, 6, 6, 6, 6, 6, 8, 8, … ## $ trans <chr> "auto(l5)", "manual(m5)", "manual(m6)", "auto(av)", "auto… ## $ drv <chr> "f", "f", "f", "f", "f", "f", "f", "4", "4", "4", "4", "4… ## $ cty <int> 18, 21, 20, 21, 16, 18, 18, 18, 16, 20, 19, 15, 17, 17, 1… ## $ hwy <int> 29, 29, 31, 30, 26, 26, 27, 26, 25, 28, 27, 25, 25, 25, 2… ## $ fl <chr> "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p… ## $ class <chr> "compact", "compact", "compact", "compact", "compact", "c… mpg %>% group_by(trans) %>% count() %>% ungroup() %>% ggplot(mapping = aes(trans, n)) + geom_bar(stat = "identity")
Jul 13, 2024
Please check McNeish (2018) for details. Here, I am using the NSCH dataset as an example. #load packages library(haven) library(tidyverse) library(userfriendlyscience) #import data data<-read_sav("nsch.sav") #clear the current graphics frame and get ready for the next plot plot.new()
Apr 8, 2024
Load Packages library(fastDummies) library(tidyverse) library(psych) Create a DataSet # Create a vector of race scores race <- c("White", "Black", "Asian", "Hispanic", "Other") # Generate random income values for each race (100 cases) set.seed(123) # for reproducibility income <- round(runif(100, min = 20000, max = 100000), digits = 2) # Repeat each race 20 times to get 100 cases race <- rep(race, each = 20) # Combine race and income into a data frame data <- data.frame(race, income) # Print the first few rows of the dataset print(head(data)) ## race income ## 1 White 43006.20 ## 2 White 83064.41 ## 3 White 52718.15 ## 4 White 90641.39 ## 5 White 95237.38 ## 6 White 23644.52 Create Dummy Variables data<-data %>% dummy_cols(select_columns = "race") Regress Income on Race (African Americans as the Reference Category) fit<-lm(income ~ race_Asian + race_Hispanic + race_Other + race_White, data=data) summary(fit) ## ## Call: ## lm(formula = income ~ race_Asian + race_Hispanic + race_Other + ## race_White, data = data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -44169 -19531 -1137 18010 40481 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 66138 5066 13.055 <2e-16 *** ## race_Asian -15015 7165 -2.096 0.0388 * ## race_Hispanic -7004 7165 -0.977 0.3308 ## race_Other -7173 7165 -1.001 0.3193 ## race_White -2073 7165 -0.289 0.7730 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 22660 on 95 degrees of freedom ## Multiple R-squared: 0.05237, Adjusted R-squared: 0.01247 ## F-statistic: 1.313 on 4 and 95 DF, p-value: 0.2709
Feb 15, 2024
Introduction In this post, I am going to demonstrate how to display Chinese characters in ggplot2 #loading necessary packages library(rvest) library(xml2) library(tidyverse) library(tidytext) library(knitr) Let’s get our text data by scraping Chien-Ming Wang’s page on Wikipedia. I will not explain the process of web scraping in this post (but will do this in another post). The focus is on displaying Chinese character in ggplot2.
Sep 10, 2021
Introduction Recoding values is one of the most common tasks a researcher needs to do before data analysis. For me, often I need to prepare my data in R first before using it for advanced statistical analyses in Mplus. In this case, it is important to recode missing values to a specific extreme value (e.g., -999) since it will be more efficient for Mplus to recognize and handle missing values. In this post, I will demonstrate a way (Oh yeah! This is the beauty of R: All roads lead to Rome.) to handle the recording task using case_when function in Tidyverse. There are different ways to get this job done, but I feel that case_when makes the most sense to me. Let’s get started.
Aug 26, 2021
Introduction In this post, I am going to demonstrate how to compute composite scores or means aggregated over multiple items. There are at least two approaches to achieve the goal.
Jul 23, 2021