Prediction of Student Behaviour Problems at Home or School

Phoebe Yu

Age 17 | Surrey, British Columbia

Second Place, Behavioral Sciences for Delaware County Science and Engineering Fair | Honorable Mention at Delaware Valley Science Fair | Runner-Up, New York Times 2020 STEM Writing Contest

INTRODUCTION

Recently, there is an emerging area of research suggesting that many children’s developmental stages may be hindered due to existing behavioral issues (Gortmaker, Steven L., et al., 1990). Disruptive disorders such as conduct disorders, affecting students both at home and at school, often lead children to negative, recurring symptoms like substance abuse, depressive tantrums, and destruction of property (Kauffman, James M., 1997). Childhood behavioral problems can take their toll on humans far beyond the developmental ages, possibly into adulthood. In the United States, approximately 1 in 6 children aged 2-8 years had a diagnosed mental, behavioral, or developmental disorder. 7.4% of children aged 3-17, around 4.5 million, have a diagnosed behavioral problem. Moreover, among children aged 3-17 years with behavioral problems, more than one third also have anxiety, along with one fifth having depression. About 3 in 4 children aged 3-17 years with depression also have anxiety, as well as 1 in 2 having behavioral problems (Cree RA et al., 2016).

Previously, studies have examined the correlation between several basic factors, such as age, and the presence of behavioral problems in children. For instance, one study indicates that behavioral problems are more prevalent among children aged 6-11 years than children younger or older (Ghandour RM, et al., 2019). On the other hand, the significance of gender as a predictive factor is debated. In one case, gender differences among children with behavioral problems at school are not significant (Kristoffersen J., Smith N., 2013); in another, males can be more vulnerable, in terms of developing behavioral disorders (Morrison N., 2016). However, many more subtle predictors of such delinquent behaviors, beyond general factors like race, gender, and age, have not been deeply analyzed. Moreover, few sources examine more than 5-6 variables at once pertaining to this topic.

 The first aim of this study is to critically examine the specific predictors of student behavioral problems at home or school among students aged 5-17. The specific age range is chosen because 18 is the legal adult age in the US; this study is specific to children/adolescents. Moreover, any child under 5 years old is not considered in this study, as it is very rare for them to receive a diagnosis of serious behavioral problems (Gardner F., & Shaw D. S., 2008). This study investigates students with behavioral problems at both home and school, as they may behave differently across different environments (deBros Kathryn, 2014). The second aim of this study is to build a predictive model for student behavioral problems at home or school using artificial neural network and compare its performance to a logistic regression model. Specifically, logistic regression is used because it is the most common analysis to conduct when it comes to working with binary variables (e.g., yes/no, like/dislike, tall/not tall). Artificial neural network is utilized because it can detect all interactions between variables and quickly identify the complex relationship between them; moreover, it is a relatively newer model that can be appropriately compared with the logistic regression (logit) model. By using reliable binary statistical models to interpret the relationship between the predictive variables and the outcome (behavioral problem), it is possible to predict student behavioral problems at home or school in a more precise manner.

 Although behavioral issues can vary in severity, it is essential to detect them early on, so that parents and educators are able to provide timely intervention. With the necessary aid, children may be less likely to develop further adverse conditions.

HYPOTHESIS

It is hypothesized that race, ethnicity, gender, age, and economic status are all effective variables to predict the likelihood of a student aged 5-17 having behavior problems at home or school.

METHODOLOGY

The data utilized in this study derives from the Medical Expenditure Panel Survey (MEPS) data in the year 2016. The MEPS, commenced in 1996, is a comprehensive set of large-scale surveys of families, individuals, their medical providers, such as doctors and pharmacists, and employers across the United States. The MEPS has two main components: the Household Component (HC) and the Insurance Component. This study will only use the Household Component, as it chiefly provides data from individual households as well as their medical providers. The sample of the families involved in the HC data was drawn from nationally representative communities across the United States, all of whom have also participated in the National Health Interview Survey. In 2016 alone, 33,259 people and 13,587 families were surveyed. During the interviews, MEPS-HC collects precise information from each person per household on multiple topics, including demographic characteristics, health status, access to care, and income. This provides crucial variables that are used in this study. Because of the complexity of the questionnaire models, interviews are conducted using computer-assisted personal interviewing (CAPI), which involves laptop computers. Finally, due to the panel design of the survey, which entails multiple rounds of interviews that are carried out within two full calendar years, it is possible to determine how changes in respondents’ health status, income, and other variables are related.

 One of the HC questionnaire sections addresses the welfare of the children in the surveyed families. The Child Preventive Health section inquires about the child’s general health status, special health care needs, potential behavioral problems, height, weight, and so on. A behavioral problem (occurring at school, at home, or both) is defined by MEPS and rated on a 4-point scale. For this study, a child is determined to have behavioral problem(s) if the rating is greater than 0.

Models

1) Artificial Neural Network (ANN): a machine learning algorithm consisting of interconnected groups of artificial neurons that processes information and can be trained to perform tasks such as identifying patterns in data. The structure of an ANN mimics that of the biological neural network of a human brain. Also known as a “neural network,” an ANN is an adaptive system that changes its structure based on the information that flows through the network during the learning phase. In other words, it is a modeling tool that can be “trained” to identify the complex relationships between variables in a set of data. One practical use of ANN is data mining, where information is collected to generate newfound patterns and ideas. Thus, the purpose of using this model in this study is to process the variables chosen from the MEPS and to recognize the relationships between them, as well as to determine their level of importance as predictive factors for the tested outcome (Hardesty, Larry, 2017).

2) Logistic Regression: a machine learning method that can classify new samples and provide probabilities using continuous (any value between any intervals, such as weight), discrete (countable numbers, such as the number of students), or dichotomous (only two answers, such as obese or not obese) measurements. This method is used in this study to calculate the predicted risk of behavioral problems for students. Unlike in linear regression, where the dependent variable “Y” is assumed to be normally distribut- ed, logistic regression renders the “Y” to be binary. Here, data is fit to certain distributions by finding the maximum likelihood. The logistic regression model can be expressed with the formula (Afresti, Alan, 1996):

ln(P/1-P) = β0 + β1*X1 + β2*X2 + ..... + βn*Xn

Other Tools

Neuralnet: a precreated package in R that is used to analyze neural networks. First, R is a program and environment for statistical modeling and computing. Neuralnet centers on multilayer perceptron neural networks, meaning ANNs with multiple layers. The in- put (carrying covariates) and output (carrying response variables) layers of an ANN are expressed, while the “hidden layers” in the middle are not directly visible. This is because the “synapses” of the hidden layer neurons directly connect to the inputs of other layers of neurons. Ultimately, Neuralnet trains the ANN by feed- ing it input data (with the training sample) and the corresponding output pattern; this method is known as supervised learning. ROC (Receiver Operating Characteristics): a graphical plot that illustrates the predictive power of binary predictive models. It is mainly used to compare the diagnostic ability between multiple binary classifiers. The ROC curve is created by plotting the true positive rate, or the sensitivity, against the false positive rate (1 - specificity) at various thresholds. The true positive rate indicates successful classification or detection, whereas the false positive rate indicates false alarm. By plotting different ROC curves on the graph, in this case those of logistic regression and artificial neural network, and by analyzing which curve has a greater area under the curve (AUC), the model with a greater predictive power is identified.

Corrgram: generated in R to help the reader visualize the data result. This display is a graphical representation of the correlation matrices; it is a table that shows the correlation coefficients (strength of relationship) between variables. Using shading, color, and surface area, a corrgram is able to display the precise correlation value of each variable. Below, in the results, the positive correlations are shown in blue, while the negative correlations are shown in red. A darker hue represents a greater magnitude of correlation.

Variables

The outcome variable is based on HOMEBH42 (PROBLEM W/BEHAVIOR AT HOME (5-17)- R4/2) and SCHLBH42 (PROBLEM W/BEHAVIOR AT SCHOOL (5-17)-R4/2). The level of behavior problem was rated using a 0-4 scale in MEPS 2016, with “0” being no behavioral problem and “4” being severe behavioral problem. For this project, the student was defined to have behavior problem(s) if either HOMEBH42 and/or SCHLBH42 has a value greater than zero.

There are 10 exposure variables. “POVCAT16” represents the income level of the subject’s family; it is rated out of 1-5, with “1” representing a negative income level and “5” representing high income. “RTHLTH53” measures the subject’s health status and is also rated out of 1-5, with “1” representing excellent health status and “5” representing poor health conditions. “MNHLTH53” measures the student’s perceived mental health status and is rated from 1-5 in a similar fashion as the previous variable. “AGE16X” measures the subject’s age as of December 31st, 2016. “Male” is a binary variable that indicates if the student has an assigned sex of male. “Black,” “White,” and “Asian,” are all binary variables that detects if the subject is of the aforementioned races. “Hispanic” is another binary variable that measures if the student is of Hispanic ethnicity. Finally, “HCNEEDS” is a binary variable that measures if the student has special health care needs.

RESULTS

Approximately 33.7% of 6,616 students were predicted to be victims of behavior problems at home or school, with around 30.3% among females and 37.0% among males.

Table 1. Logistic regression for behavior problem at home or school among stu- dents aged 5-17. *= Statistically significant: Pr< 0.05

Table 1. Logistic regression for behavior problem at home or school among stu- dents aged 5-17.

*= Statistically significant: Pr< 0.05

Figure 1. Matrix of correlations between variables. Red = negative. Blue = positive. The relationship between the 2 variables: darker hue = greater correlation.

Figure 1. Matrix of correlations between variables. Red = negative. Blue = positive. The relationship between the 2 variables: darker hue = greater correlation.

According to the logistic regression, male students, older children, children with worse mental health or better physical health were more likely to have behavior problems. Students who require special healthcare services were also more likely to have behavior problems at home or at school. Compared to other racial populations, Caucasian (White), African-American (Black), and Asian participants were less likely to have behavior problems. Compared to students of other ethnicities, Hispanic students were less likely to have behavior problems at home or school. The result for the income status variable (POVCAT16) is not statistically significant; therefore, the null hypothesis for that variable, that the correlation occurred by chance, is not rejected.

Figure 2: Artificial neural network in training sample

Figure 2: Artificial neural network in training sample

In the plot above, different weights are attached to each neuron in order to differentiate between the various importance levels of each variable. The line thickness represents weight magnitude. The line color indicates the weight’s positive/negative sign (black = positive, grey = negative.) The net is essentially a black box, which means that not much information can be gathered from studying the fitting, weights, and structure of this neural network. In other words, there is no precise link between the weights and the approximated function; only the results of the primary and final layer are clear. It is apparent that the training algorithm has converged, therefore rendering the model to be readily used.

Figure 3: Variable importance in artificial neural network

Figure 3: Variable importance in artificial neural network

According to this neural network, the top 5 most important predictors were Asian students, mental health, Black (African-American) students, students with good physical health and students with special needs for healthcare.

Figure 4: ROC in training sample for logistic regression (red) vs. neural network (blue).

Figure 4: ROC in training sample for logistic regression (red) vs. neural network (blue).

Figure 5: ROC in testing sample for logistic regression (red) vs. neural network (blue).

Figure 5: ROC in testing sample for logistic regression (red) vs. neural network (blue).

For the training sample, the area under the ROC curves was 0.69 for the logistic regression and 0.74 for the artificial neural network. In this case, the artificial neural network evidently performed better. However, in the testing sample, the area under the ROC curves was 0.69 for the logistic regression and 0.68 for the artificial neural network; both models had similar performance in terms of successful prediction.

DISCUSSION

In conclusion, approximately one third of the 6,616 sampled students had a certain behavioral problem(s). If the sample is stratified into genders, around 30.3% of females and 37% of males had behavioral problem(s). According to the neural network, the top 5 most important predictors were: Asian race, mental health, African-American students, physical health and special needs for healthcare, in this exact order.

Age

According to the logistic regression, older students were less likely to have behavior problems at home or school. This could be explained by humans’ mental (and physical) maturation as they age. When children undergo puberty, they experience biological, physical, and emotional developments. During this phase, the adolescents may display fluctuating emotions and behaviors. Through puberty, they develop emotional regulation and, generally, a more mature mindset. Therefore, behavioral problems may be less likely predicted in older children

Gender

In the logistic regression, male students were more likely to have behavioral problems than female students. Evidence suggests that boys are not genetically predisposed to, or cannot inherently, have a greater likelihood of developing behavioral problems (Kristoffersen J., Smith N., 2013). This, then, calls for a deeper investigation into the environmental and societal factors that affect such a differentiation. A notable factor could lie in the difference between how educators, supervisors, or parents respond to delinquent behaviors of boys versus how they respond to those of girls. Studies suggest that educators and supervisors tend to have harsher views and reactions with boys with behavioral issues, thus impacting the educational attainment of male students (Morrison N., 2016). Naturally, this may lead to the greater prevalence of behavioral problems within male students when compared to the female students.

Race and Ethnicity

The neural network indicates that 2 of the 5 top predictors were Asian and Black students, suggesting that race is a highly important factor when it comes to predicting behavioral problems among students. The logistic regression indicates that Caucasian, African-American, Asian, and Hispanic students were less likely to have behavioral problems, when compared with other racial populations. In other words, compared with other races or unknown races/ethnicities, the selected reference group of this study had a lower likelihood of having behavioral problems. The reasoning behind this may lie in complex societal factors such as discrimination and the socioeconomic status of different races. In a survey of 2,490 adolescents with low socioeconomic status and in racial/ethnic minorities, 73 percent experienced racial/ethnic discrimination (Amy L. et al., 2013). Those adolescents who experienced racial/ethnic discrimination were more likely to report greater physical aggression, delinquency and other behavioral issues. This shows that discrimination, regardless of intensity, can increase a victimized child’s likelihood of developing behavioral problems. Moreover, the cultural stigma of receiving mental health care may be another possible explanation of the result. For instance, some cultures may not openly access mental health services, due to family, cultural, or racial restrictions (Gopalkrishnan, N., 2018).

Mental Health, Physical Health and Special Healthcare Needs

According to the logistic regression, students with worse mental health or better physical health were more likely to have behavioral problems. The correlation between mental health status and behavioral problems is logical; behavioral health is simply a blanket term that includes mental health. On the other hand, the correlation between physical health and behavioral problems may need further explanation. Based on the results, students with better physical abilities have a greater likelihood of having behavioral problems. Contradicting the results, studies indicate that better physical health generates positive effects within children, such as greater cognitive abilities, better mental health, and less likelihood of behavioral issues (King G., et al. 2007). However, a child with greater physical abilities may find it easier to display behavioral issues. For instance, children with a conduct disorder (a type of behavioral problem) may be able to cause greater destruction if they are physically capable. In the neural network, mental health and physical health are the 2nd and 4th most important predictive variables, respectively.

Results suggest that students with special needs for healthcare were more likely to have behavioral problems at home or at school. This result is well expected, as special healthcare needs target a wide span of issues, including physical, developmental, or behavioral disorders (Bethel, Read, Stein, et al., 2002). Thus, students with special needs for health care may naturally include those who may have behavioral problems. In the neural network, special healthcare needs are the 5th most important predictor for this project.

Poverty Level

Although the economic status of a student’s family has a slight importance, as depicted on the artificial neural network, neither model suggests that this variable has sufficient predictive power for the topic at hand. This result is surprising; many studies have established a strong association between one’s poverty level and one’s likelihood of developing behavioral problems: they have a positive relationship (Simon, K., et al., 2018). On the logistic regression, the data is simply not statistically significant, meaning that chance may have affected the results. On the neural network, poverty level is only the 6th most important predictor (less than 1 percent). The reasoning behind the lack of influence of the poverty variable must be further researched. The slight importance still exists, possibly due to its interconnection with other variables, such as race and ethnicity. Minorities such as African-Americans, Hispanic people, and Native Americans have an approximate 20 percent poverty rate in the US (Sauter M., 2018). After discovering the connection between racial/ethnic minorities and likelihood of behavioral problems, it is reasonable for poverty to be an effective predictive variable.

Limitations of the Study

Some known factors that may predict student behavior problems at home or school, such as parents’ marital status and education level, were not readily available in this study. These factors may act as confounding variables and affect the results of the study. For instance, previous studies have shown that marital disruption can result in negative behavioral effects for children (Wallenborn J., 2019). Adolescents of divorcing parents may endure high levels of stress and negative emotions due to family instability. This marital disruption may also fluctuate in strength, based on variables such as race or socioeconomic status.

The external validity for both logistic regression and the ANN has not been tested. It is not yet plausible to generalize the conclusions beyond the context of this study. However, a comprehensive split-sample validation is performed with both strategies; a testing sample is tested using the two models. Future studies could use outside, newer data and test the performance of the outputs from the two models in this study.

In order for the results to be more accurate and applicable to the stated population, the sample size of this study must be increased. Future research may also focus on determining a more appropriate population parameter.

 Both models had similar discriminating capabilities, as measured by the ROC curve. The AUC indicates the predictive ability of each model to tell the cases (students with behavioral problems) from the non-cases (other factors); it is the most important index for a model, as it essentially indicates the predictive power of a model. Fortunately, both the logistic regression and the ANN have an acceptable AUC level for testing and training samples.

CONCLUSION

In this study, several important predictors for student behavior problems at home or school were identified, examined, and tested. The results provide information that allows educators, parents, and advisors to provide timely intervention. After analyzing possible explanations for why certain variables are more predictive than others, it is possible to mitigate these situations and thus lower the likelihood of children developing behavioral problems. For instance, taking into consideration that discrimination and/or socioeconomic status may increase the likelihood of minorities having behavioral problems, one can discuss these problems with the at-risk children and support them appropriately. Adult supervisors may also seek opportunities to remove at-risk children from these stressors as soon as possible.

Two mathematical models were constructed using information from the MEPS. First, several predictive variables were chosen upon examining other empirical studies. Then, they were plotted onto a logistic regression model and a corregram was developed to showcase the relationship between the variables. An artificial neural network was created to measure the predictive importance of each variable. Finally, the two models were compared for their discriminating capabilities using the ROC curve. Both had similar performance levels.

 The most important predictors were Asian students, mental health, Black students, physical health, and special healthcare needs (in this exact order).

 The factors that suggest a higher likelihood of the student having behavioral problems are better physical health, worse mental health, special needs for healthcare, younger age, male sex, a race other than Caucasian, African-American or Asian, and an ethnicity other than Hispanic. By analyzing the factors that may contribute to these results, educators and adult supervisors can seek resources early on and provide intervention before symptoms of behavioral problems worsen.

 After conducting the research study, the hypothesis is proven false. Economic status was not a valid predictive variable, as it was not statistically significant; this phenomenon must be further researched. In sum, a predictive model is developed using artificial neural networks as well as logistic regression and acts as a tool for early detection of student behavior problems at home or school.

REFERENCES

Arky, B. (n.d.). Why Are Kids Different at Home and at School. Retrieved from https://childmind.org/article/kids-different-home- school/

Banta, J. E., James, S., Haviland, M. G., & Andersen, R. M. (2012). Race/Ethnicity, Parent-Identified Emotional Difficulties, and Mental Health Visits Among California Children. The Journal of Behavioral Health Services & Research, 40(1), 5-19. doi:10.1007/s11414-012-9298-7

Das, S. R. (2017, March 24). Data Science: Theories, Models, Algorithms, and Analytics. Retrieved from https://srdas.github.io/MLBook/

Definition of Special Health Care Needs. (n.d.). Retrieved from https://www.aapd.org/research/oral-health-policies-- recommendations/special-health-care-needs/

Fauth, R. C., Platt, L., & Parsons, S. (2017). The development of behavior problems among disabled and non-disabled children in England. Journal of Applied Developmental Psychology, 52, 46-58. doi:10.1016/j.appdev.2017.06.008

Freeman, D., Medical Research Council, & Department of Psychiatry. (2017, November 15). Are boys genetically predisposed to behavioural problems? [excerpt]. Retrieved from https://blog.oup.com/2017/11/boys-genetically-predisposed-to- behavioural-problems/

Gardner, F., & Shaw, D. S. (2008). Behavioural problems of infancy and pre-school children. In M. Rutter, D. Bishop, D. Pine, S. Scott, J. Stevenson, E. Taylor, & A. Thapar (Eds.), Rutter’s child and adolescent psychiatry, 5th edition (pp. 882-894).

London: Blackwell Press

Ghandour, R. M., Sherman, L. J., Vladutiu, C. J., Ali, M. M., Lynch, S. E., Bitsko, R. H., & Blumberg, S. J. (2019). Prevalence and treatment of depression, anxiety, and conduct problems in US children. The Journal of pediatrics, 206, 256-267.

Gopalkrishnan, N. (2018). Cultural diversity and mental health: Considerations for policy and practice. Frontiers in public health, 6, 179.

Gortmaker, S. L., Walker, D. K., Weitzman, M., & Sobol, A. M. (1990). Chronic conditions, socioeconomic risks, and behavioral problems in children and adolescents. Pediatrics, 85(3), 267-276.

Hardesty, L. (2017, April 14). Explained: Neural networks. Retrieved from http://news.mit.edu/2017/explained-neural-networks-deep-learning-0414

Kauffman, J. M. (1997). Characteristics of emotional and behavioral disorders of children and youth. Merrill/Prentice Hall, One Lake Street, Upper Saddle River, NJ 07458.

King, G., McDougall, J., DeWit, D., Hong, S., Miller, L., Offord, D., ... & LaPorta, J. (2005). Pathways to children’s academic performance and prosocial behaviour: Roles of physical health status, environmental, family, and child factors.

International Journal of Disability, Development and Education, 52(4), 313-344.

Kristoffersen, J., & Smith, N. (2013). Gender Differences in the effects of behavioral problems on school outcomes.

Lechtenberg, Ula. (n.d.). Research Guides: Organizing Academic Research Papers: Purpose of Guide. Retrieved from https://library.sacredheart.edu/c.php?g=29803&p=185901

Morrison, N. (2016, June 22). Poor Behavior Hits Boys Hardest. Retrieved from https://www.forbes.com/sites/nickmorrison/2016/06/22/poor-behavior-hits-boys-hardest/#45ff00ca38e8

Narkhede, S. (2019, May 26). Understanding AUC - ROC Curve. Retrieved from https://towardsdatascience.com/understanding- auc-roc-curve-68b2303cc9c5

Peng, C. Y. J., Lee, K. L., & Ingersoll, G. M. (2002). An introduction to logistic regression analysis and reporting. The journal of educational research, 96(1), 3-14.

Sauter, M. B. (2018, October 10). Faces of poverty: What racial, social groups are more likely to experience it? Retrieved from https://www.usatoday.com/story/money/economy/2018/10/10/faces-poverty-social-racial-factors/37977173/

Simon, K., Beder, M., & Manseau, M. (2018). Addressing poverty and mental illness. Psychiatric Times, 35(6), 1-4.

Singh, Y., & Chauhan, A. S. (2009). NEURAL NETWORKS IN DATA MINING. Journal of Theoretical & Applied Information Technology, 5(1).

Tabachnick, B. G., Fidell, L. S., & Ullman, J. B. (2007). Using multivariate statistics (Vol. 5). Boston, MA: Pearson.

Tobler, Amy L., et al. "Perceived Racial/Ethnic Discrimination, Problem Behaviors, and Mental Health among Minority Urban Youth."

United States. Public Health Service. Office of the Surgeon General, Center for Mental Health Services (US), National Institute of Mental Health (US), United States. Substance Abuse, & Mental Health Services Administration. (2001). Mental health: Culture, race, and ethnicity: A supplement to mental health: A report of the Surgeon General (Vol. 2). Department of Health and Human Services, US Public Health Service.

Venables, W. N., Smith, D. M., & R Development Core Team. (2009). An introduction to R.

Wallenborn, J. T., Chambers, G., Lowery, E., & Masho, S. W. (2019). Marital Status Disruptions and Internalizing Disorders of Children. Psychiatry Journal, 2019.

phoebe.png

Phoebe Yu

A service-oriented student, advocate, leader, and teammate who strives to bridge the gap between scientific inquiry and the youth community, Phoebe is a young entrepreneur who loves to study psychology and other behavioral science-related topics. She is currently attending Southridge school. Beyond exploring psychological sciences, her passions include service, visual arts, entrepreneurship, and equestrian. Phoebe is also working with a social psychology graduate student at UBC and is facilitating their current project on narcissism. She looks forward to future opportunities in exploring psychology and applying her knowledge to help improve the world.