The algorithms often used by colleges to predict students’ likelihood of graduating can produce less accurate results for Black and Hispanic students compared to their peers, a new study says.
Researchers examined a predictive-analytics model that was supposed to forecast the likelihood that a student attending a four-year university would obtain a bachelor’s degree in eight years after completing high school. The model was more likely to predict failure for Black and Hispanic students who actually succeeded, and more likely to overestimate the success of white and Asian students, the study found.
It incorrectly labeled both Black and Hispanic students as not likely to graduate roughly 20 percent of the time, compared to false-negative rates of 12 percent and 5 percent for white and Asian students, respectively.
Admissions officials, academic advisers, and faculty members use such tools to make decisions like where to allocate resources and whether to admit certain students. The data is also supposed to help administrators and faculty members identify students who need more support.
When these models appear biased against certain racial groups, however, they can have negative implications depending on how the predictions are used. For example, researchers have cautioned against the use of predictive models in admissions decisions. Higher-education experts have warned that predictions could lead to educational tracking that encourages minority students to pursue courses of study that are perceived as less difficult.
The use of predictive analytics at colleges is widespread. Administrators and educational researchers input historical data like the standardized-test scores or the demographic information of a group of students into predictive models, and the models map the relationship between those data points and outcomes, like graduation rates.
In a video explaining the findings, Denisa Gándara, one of the study’s four authors, said it was expected for the models to produce worse outcomes for some students of color because the nation’s educational system has inequities baked in. Researchers were surprised, however, that the models were less accurate for those students.
“In terms of the bias level itself, it wasn’t surprising, but it was amazing that we thought it was biased toward Hispanic and Black students — that’s what we knew from the education domain. It was still impressive to see how these statistical tools can capture that,” Hadis Anahideh, a co-author, said in an interview.
There are ways to mitigate the biases ingrained in these models, but in this particular study, researchers found that they were generally ineffective. Yet, there are steps colleges can take rather than ditching predictive models altogether.
In an email to The Chronicle, Gándara suggested that sources of bias be investigated. She said that models may be biased because they exclude variables that are more indicative of student success for minority students, like measures of campus racial climate or cultural relevance in the curriculum. Incorporating those variables would improve accuracy and reduce algorithmic bias.
She also said that incorporating fairness constraints, which are tools that remove unwanted behavior from a system, can help reduce bias, even if they don’t completely eliminate it. Gándara and Anahideh are developing open-source materials for institutional researchers and other data practitioners that can help audit and mitigate bias in student-success models.
Gándara recommends that administrators and faculty members be trained on how the algorithms work. Student-success models are merely tools, she said; what matters is how people use them.
“Rather than uncritically accepting model output as prophecy or dismissing it altogether,” she said, “predictions should be used as a starting point to ask questions, such as, ‘Why am I seeing that my Black and Hispanic students are more likely to fail?’ and more importantly, ‘What can I do about that?’”