---
title             : "Predicting English Language Proficiency Scores from Learner Background Variables: The role of gender, age, and years abroad in test scores."
shorttitle        : "Predicting English Language Proficiency Scores from Learner Background Variables"

author:
  - name          : "Ryan Barnes"
    institution   : "1"

affiliation:
  - id            : "1"
    institution   : "Nagoya Gakuin University"

bibliography      : ["./references.bib"]

header-includes:
#  - \usepackage[american]{babel} # or not, has caused errors in the past
  - \setmainfont{Times New Roman}
  - \usepackage[style=apa]{biblatex}
#  - \usepackage{setspace}  # added from here, may not be necessary
#  - \usepackage{xurl}
#  - \usepackage{enumitem}
#  - \usepackage{breakcites}


floatsintext      : yes
figurelist        : no
tablelist         : no
footnotelist      : no
linenumbers       : no
mask              : no
draft             : no
nocite            : |
  @allaire2021rmarkdown

cls               : "`r system.file('rmd', 'apa7.csl', package = 'papaja')`"
documentclass     : "apa7"
classoption       : "man"
output            :
  papaja::apa6_pdf:
    latex_engine  : lualatex

abstract          : "Demographic factors, such as learner background variables, are strongly correlated with aptitude test scores of international English as a Second Language learners. This study conducted a multiple linear regression to assess the role of gender, length of study abroad, and age of international college students enrolled in an English for Academic Purposes program at a Midwestern university in the United States (n=626) in predicting scores on a test of English proficiency. Regression results were significant, although the explanatory power of the overall model was weak. These results suggest that certain learner backgrounds play a role in predicting test performance, and may be of interest language program designers and classroom teachers in placing students, and planning for English as a Second Language courses."
---
```{r global_options, include=FALSE}
knitr::opts_chunk$set(fig.pos = 'H')
# options(tinytex.verbose = TRUE)
# tinytex::tlmgr_install("haranoaji") if it crashses
```

```{r setup, include = FALSE}
knitr::opts_chunk$set(echo = F, warnings = F, message = F)

if(!"tinytex" %in% rownames(installed.packages()))
  {
  install.packages("tinytex")
  tinytex::install_tinytex()
  }
if(!"devtools" %in% rownames(installed.packages()))
  install.packages("devtools")
if(!"papaja" %in% rownames(installed.packages()))
  devtools::install_github("crsh/papaja")
if(!"skimr" %in% rownames(installed.packages()))
  devtools::install_github("ropensci/skimr")
if(!"pacman" %in% rownames(installed.packages()))
  install.packages("pacman")

pacman::p_load(papaja,  # For APA format
               tidyverse,
               knitr,
               measurements,
               kableExtra,
               magick,
               psych,
               rvest,
               rms,  # for lrm
               stargazer,
               skimr)
```

```{r data_collection}
studentinfo = read.csv("./PELIC-dataset/corpus_files/student_information.csv")
testscores = read.csv("./PELIC-dataset/corpus_files/test_scores.csv")

df = left_join(studentinfo, testscores)
df = df %>% drop_na %>%
  rename(`Years Abroad`=yrs_in_english_environment,
         Gender=gender,
         Age=age,
         `MTELP Total Combined Score`=MTELP_Conv_Score) %>%
  filter(Age < 65 & `MTELP Total Combined Score` > 2)
#  filter(Age < 65)

df$Gender = factor(df$Gender)
df$`Years Abroad` = factor(df$`Years Abroad`,
                           levels = c('none',
                                      'less than 1 year',
                                      '1-2 years',
                                      '3-5 years',
                                      'more than 5 years'))
```
# Introduction
Although cognitive factors [@nakata2008English; @cross2010Raising; @zarei2013effect] and psychological factors [@masgoret2003Attitudes; @engin2009Second; @sampson2015Tracing] have been extensively studied in the literature for second language acquisition, recent studies [e.g., @vandergrift2015Learner; @olivares2012Foreign; @vanderslik2015gender] have called for examining the role of learner background variables as well. Learner backgrounds are a demographic subset of learner variables and can include details such as gender, age, native language, and length of time abroad in an English-speaking environment.

The [Michigan Test of English Language Proficiency (MTELP Series)](https://en.wikipedia.org/wiki/MTELP_Series) is a test to measure language achievement and language acquisition of English language learners studying in an institutional language program. The exam is often used in determining placement for students in an appropriate English as a Second Language (ESL) level [@hille2020Placement; @walter2014Dimensionality; @zhang2010Assessing], and tests listening comprehension, reading comprehension, grammatical knowledge and vocabulary range. This study will examine the role that learner background variables have in predicting the weighted, cumulative, converted score of this test as a holistic score of language proficiency.

# Literature review
In language learning contexts, learner background variables refer to individual characteristics and experiences that can influence language learning. In the literature, these variables frequently overlap with individual differences [@gradman1991Language; @oxford1992Second], but often center on non-changeable characteristics as age, native language, gender, educational history, and length of stay in a country where the target language is spoken [@spurling1985impact].

Learner background variables that have been found to be associated with proficiency outcomes in second language development include gender, where there was a gender gap between females and males in both written and spoken second language proficiency [@nyikos1990Sexrelated; @cangaalonso1970Receptive; @agustinllach2012Vocabulary]; study abroad experience, where students made significant gains in a second language environment compared to their peers who studied at their home university [@segalowitz2004comparison; @llanes2009short; @vandeberg2009Georgetown]; and age [@singleton2001Age; @nikolov2006Recent], where there is a hotly debated topic that younger learners learn second languages better than older learners [e.g., @munoz2011critical]. In this study, I hypothesize that the three variables of length of study abroad experience, gender, and age would significantly predict test scores, in line with the previous studies mentioned above.

## Research Question
- RQ: Do gender, length of study abroad experience, or age predict test scores for a standardized test for English language proficiency?

# Methodology
Data was cloned from the University of Pittsburgh English Language Institute Corpus (PELIC) [GitHub repository](https://github.com/ELI-Data-Mining-Group/PELIC-dataset), "a large learner corpus of written and spoken texts \textellipsis collected in an English for Academic Purposes (EAP) context over seven years in the University of Pittsburgh’s Intensive English Program, and were produced by students with a wide range of linguistic backgrounds and proficiency levels" [@juffs2020University]. The combined dataset includes anonymized learner background variables and test scores for `r nrow(df)` students.

## Measures
The MTELP Combined Score is the weighted, cumulative, converted score of the MTELP Series and was used as the outcome variable. The converted scale ranges from 0 to 100 points. The predictor values i.e., Gender, Years Abroad, Student Age were all self-reported. Gender included only binary male and female in this sample. Years Abroad was taken from a drop-down list indicating number of years in an English-speaking environment and included `r describe(df$'Years Abroad')$values$value[2]`, `r describe(df$'Years Abroad')$values$value[3]`, `r describe(df$'Years Abroad')$values$value[4]`, and `r describe(df$'Years Abroad')$values$value[5]`. Student Age at the time of enrollment was measured in years.

## Procecures
Data analysis was conducted using `r R.version$version.string` [@rcoreteam2023language] in RStudio version `r rstudioapi::versionInfo()$version` [@positteam2023RStudio] on a computer running macOS version `r as.numeric_version(system("sw_vers -productVersion", intern = TRUE))`. Data cleaning was completed and missing data were removed using listwise deletion. Data importation, tidying, wrangling, and visualization were done using the tidyverse version `r packageVersion('tidyverse')` [@wickham2019Welcome] collection of R packages. Descriptive statistics and an assessment of assumptions were completed.

The following model equation was used:
\begin{align*}
MTELP = &\beta_0 + \beta_1X_{Age} + \beta2X_{Gender} + \\
&\beta_3X_{No Years Abroad} + \beta_4X_{<1YearAbroad} + \beta_5X_{1–2 Years Abroad} + \\
&\beta_6X_{3–5 Years Abroad} + \beta_7X_{>5 Years Abroad} + \epsilon
\end{align*}

# Results
The data was analyzed for outliers and evaluated based on the assumptions of normality, linearity, homoscedasticity, independence, and multicollinearity, as well as specification and measurement error. While the assumptions of homoscedasticity, independence, linearity were deemed reasonably tenable, three outliers---identified by univarite and bivariate analysis---showed that two ages were unusually high (86 and 100 years old). It was suspected that the ages were misreported by the students---one birth year was entered as 19, possibly causing the data entry software to assign the maximum age of 100, and the other birth year was 1938, which may also be a data entry error, as it is not reasonable to expect an 86 year old to participate in a university-based study abroad program. As a result, these two cases were deleted due to suspected inaccuracies self-reporting. A third outlier was a MTELP Combined Score that was scored as 1. As this score is the weighted, cumulative, converted score of a series of exams, scoring a 1 seemed highly unlikely in this context. Upon reviewing the scores from the series, this Combined Score was inconsistent with points scored on other exams. As a result, this case was deleted due to suspected inaccurate data entry.

## Descriptive Statistics

```{r descrip}
minscore = min(df$`MTELP Total Combined Score`)
minage = min(df$Age)

maxscore = max(df$`MTELP Total Combined Score`)
maxage = max(df$Age)

meanscore = mean(df$`MTELP Total Combined Score`) %>% round(2)
meanage = mean(df$Age) %>% round(2)

sdscore = sd(df$`MTELP Total Combined Score`) %>% round(2)
sdage = sd(df$Age) %>% round(2)

yabroad.none = describe(df$`Years Abroad`)$values$frequency[1]
yabroad.lessthanone = describe(df$`Years Abroad`)$values$frequency[2]
yabroad.noneorlessthanone = yabroad.none + yabroad.lessthanone

nmales = describe(df$Gender)$values$frequency[2]
nfemales = describe(df$Gender)$values$frequency[1]
```
<!--\nocite because of reference used in appendix-->

Descriptive statistics can be found in Table\ \@ref(tab:summary). MTELP Total Combined Scores ranged from `r minscore` to `r maxscore` with a mean of `r meanscore` (SD=`r sdscore`). Age, measured in years, ranged from `r minage` to `r maxage` with a mean of `r meanage` (SD=`r sdage`). Years Abroad heavily skewed toward students who had experience abroad of less than one year, i.e., `r yabroad.noneorlessthanone` of `r nrow(df)` students. As for self-reported genders, males $(n_m = `r nmales`)$ slightly outnumbered females $(n_f = `r nfemales`)$.

```{r summary}
df %>%
  select(Gender, `Years Abroad`, Age, `MTELP Total Combined Score`) %>%
  group_by(`Years Abroad`) %>%
#  drop_na %>%
  psych::describe(.) %>%
  apa_table(caption = 'Summary statistics for each of the observed variables',
            longtable = T,
            landscape = T,
            font_size = 'small')
```

```{r scoresbyage, fig.cap="Scores by age"}
df %>% drop_na() %>%
  ggplot(aes(`MTELP Total Combined Score`, Age)) +
  geom_point() +
  geom_smooth() +
  theme_apa()
```

```{r scoresbygender, fig.cap="Frequency count for scores by gender"}
df %>% drop_na() %>%
  ggplot(aes(`MTELP Total Combined Score`)) +
  geom_bar() +
  facet_grid(~ Gender) +
  theme_apa()
```

## Model Results

```{r regressiontable, results='asis'}
result = lm(`MTELP Total Combined Score`~Age+Gender+`Years Abroad`,
            data = df, na.action = na.exclude)
a = summary(result)
df.lo = a$fstatistic[[2]]
df.hi = a$fstatistic[[3]]
f = a$fstatistic[[1]]
r2 = a$r.squared
age.test.coef=a$coefficients[2]
age.test.p=a$coefficients[23]
gender.test.coef=a$coefficients[3]*-1

# ssres = sum(result$residuals^2)
# ssreg = 82.5

stargazer(result, label = paste0('tab:', knitr::opts_current$get('label')), header = F, title='Results of Multiple Regression Model',
          dep.var.labels=c('MTELP Total Combined Score'),
          covariate.labels=c('Age', 'Gender: Male', 'Years Abroad: Less than 1 year',
                             'Years Abroad: 1--2 years', 'Years Abroad: 3--5 years',
                             'Years Abroad: More than 5 Years'),
          single.row=T)
```
Results of the multiple regression can be seen in Table 2. The overall model is statistically significant at $F(`r df.lo`, `r df.hi`) = `r f`, p < .001$. The overall model fit indicated that the three predictor variables analyzed accounted for scores accounted for `r round(r2*100, 3)`% of the variance in test scores ($R^2 = `r round(r2, 3)`$). The results indicated that for each year of life, there was a `r round(age.test.coef, 2)` point increase in test scores ($p = `r round(age.test.p, 2)`$), controlling for all other predictors. Being female accounted for a `r round(gender.test.coef, 2)` point increase in test scores ($p < .001$), controlling for all other predictors. None of the categories of years abroad significantly predicted a change in test scores, controlling for all other predictors.

# Discussion
This study explored the relationship between several learner background variables and English language proficiency scores for students in a university study abroad program. While two variables were found to be significant predictors, the low explanatory power of the model indicates that there are factors or constructs influencing language proficiency. This specification error provides an opportunity for further research.

I hypothesized that females (see Figure\ \@ref(fig:scoresbygender)) and older students (see Figure\ \@ref(fig:scoresbyage)) would perform better on this language proficiency test, and the data in this sample supported that. There are a number of debates in the literature that seek to explain the gender gap in second language acquisition, where it is hypothesized that an interaction of environmental factors such as learner strategies, motivation, orientation and nature-based abilities [e.g., @vanderslik2015gender], or maturity levels [e.g., @agustinllach2012Vocabulary]. As for older students, reasons may include more experience studying the language, improved learner strategies, or the possibility of improving test scores based on having taken the test multiple times [@barkaoui2017Examining].

Years abroad did not have much impact on the scores. This may be due to the clustering nature of self-reporting data in a range: a drop-down box, which separates experience into discrete categories, rather than having the participant input a number of years. A heavy majority of students (`r round(yabroad.noneorlessthanone/nrow(df)*100)`%) selected “`r describe(df$'Years Abroad')$values$value[1]`” or “`r describe(df$'Years Abroad')$values$value[2]`” as the length of their study abroad experience, possibly skewing the data. Although it is logical to expect that more time abroad in an English-speaking country would improve one’s English [@segalowitz2004comparison], the scores remained consistent regardless of time abroad, as seen in Figure\ \@ref(fig:yearsabroad).

```{r yearsabroad, fig.cap="Scores by years abroad"}
df %>% drop_na() %>%
  ggplot(aes(`Years Abroad`, `MTELP Total Combined Score`)) +
  geom_boxplot() +
  theme_apa()
```

This may also be due to the relatively young age of the participants, as few of them would have had the opportunity to study abroad for more than a few years by the time they reached the median age of `r median(df$Age)` in this sample. Examining data from a wider variety of sources, for example ones that go beyond placement tests that are typically given to learners who are newer in an English-speaking environment, may provide different results.

# Conclusion
This study does not imply causation. Self-reporting of some data, and the limitations of entering that data (e.g., binary-only genders, clustering experience abroad into a range of years) may serve as a limitation of obtaining accurate results. To better understand the association between language proficiency test scores and learner variables, future researchers may wish to conduct qualitative research to gain a deeper understanding about the language learning experiences of participants during their time abroad. It may also be interesting to look at the learner variables of the students’ native languages and countries, ways of study, or learning environments, for example. Finally, motivational factors are a very important element of language acquisition demographic variables [@engin2009Second;@masgoret2003Attitudes;@sampson2015Tracing], and these individual differences are essential in understanding how one learns a new language.

\newpage

# References