The Influence of Physical Activity on Stress-associated Conditions in Higher Education Students

Objectives. The purpose of the study was to substantiate the influence of physical activity on stress-associated conditions in higher education students. Materials and methods. The dataset for building the models consisted of 1115 observations, 16 independent and 3 dependent variables. As the main method we used the random forest method, the idea of which is to obtain a forecast by aggregating the predictions of a set of individual decision trees, each of which is trained on a data subset isolated from the studied sample. Results. Physical activity (PA) was found to be the most important factor in predicting stress-related conditions in university students. In addition, PA levels involving moderate and high levels of energy expenditure, as well as the number of stressful events experienced, played a significant role in predicting stress among students. In order to predict stress-related conditions in higher education students, the models “Stress”, “Increased anxiety”, and “Risk of PTSD” were built using the random forest method. The model “Stress” had the highest quality: its Accuracy was 0.77, Recall – 0.86, Precision – 0.79, and F1 Score – 0.82. The “PTSD Risk” model correctly predicted 78% of cases that indicates its good overall performance, however it correctly identified only 23% of the students who actually had the signs of this disorder. Regarding the state of anxiety, given that it is less stable than stress and PTSD, which can make model training difficult, the model built had an average accuracy of 56%, as well as reduced completeness and balance. Conclusions. Models for predicting increased anxiety and identifying students with signs of PTSD require further improvement. The implementation of developed models allows to quickly identify the manifestations of stress-related conditions in higher education students and to take the necessary measures based on the engagement in PA to prevent the development of stress-related disorders.


Introduction
Against the background of the action of stress factors of increased potential in the country, there is a significant prevalence of stress-related conditions (Pavlova et al., 2022;Vypasniak et al., 2023;Kurapov et al., 2023), namely stress, anxiety, and post-traumatic stress disorders (PTSD) among the students (Rogowska & Pavlova, 2023;Zaitsev, 2023;Meshko et al., 2023).Therefore, it has become an issue particularly actively discussed by experts in the different fields (Limone et al., 2022).Our previous research shows that the field of physical education (PE) also has an inexhaustible potential to help students counteract powerful stress factors (Andrieieva et al., 2023;Byshevets et al., 2022Byshevets et al., , 2023)).So far, we have been able to make sure that the use of PA helps ISSN 1993-7989. eISSN 1993-7997. ISSN-L 1993-7989. Physical Education Theory and Methodology. Vol. 24, Num. 2 students develop stress resistance, causes the development of resistance to anxiety, and prevents the emergence of posttraumatic stress disorders (PTSD).Therefore, the research aimed at identifying cases of stress-related conditions among students, and determining the most significant factors that influence their manifestation, will allow taking the necessary measures based on the use of PA and prevent the development of stress-related health disorders of students.
Modern scientists use methods of data mining based on artificial intelligence in their research, which includes machine learning, deep learning, natural language and image processing.The application of methods of data mining with the use of artificial intelligence methods allows for finding hidden patterns, trends, and interconnections, improving the decision-making process based on empirical data, rather than guided by intuition, and predicting the development of objects and events, etc. (Kim et al., 2022;Nachouki et al., 2023).
Data mining and machine learning have significant potential in the field of training specialists (Li & Wang, 2022).Scientists are convinced that the use of machine models makes it possible to determine the key factors affecting the success of learning, and contributes to a deeper understanding of the mechanisms and patterns of educational activity (Nachouki et al., 2023).
The research of Ding et al. offered an improved system for evaluating the educational achievements of university students in PE which is more objective compared to the traditional system (Ding et al., 2023).
The article written by Wang et al. (2022) presents information on the method of evaluating the quality of PE teaching and learning which is based on deep learning.
One of the methods of data mining, increasingly used by experts in various fields of knowledge, is the random forest (RF) method.This is one of the approaches to machine learning, which is based on the RF algorithm (Schonlau & Zou, 2020).
There is evidence of the successful application of the RF method in pedagogics for predicting students' academic performance.Thus, Nachouki et al. (2023), using the RF method, established that the last cumulative average score of a student is the most important factor determining their further educational achievements.
The results of the investigation indicate the successful application of the RF method in the practice of PE (Xu & Yin, 2021).In order to improve the quality of PE, these authors identified the factors affecting the success of students in PE, and based on the use of an improved iterative RF algorithm, constructed a model that allows predicting students' grades based on their scores for each factor.It should be noted that the accuracy of the proposed model reached 88.55%.
J. Briand introduced a data mining system to predict injuries to athletes during the next 7-day training microcycle.The constructed models allow correctly predicting more than 50% of future injury days and more than 70% of future injury-free days (Briand et al., 2022).
Jiang (Jiang et al., 2023) also used the RF method to improve the assessment of the quality of PE teaching.The use of this model makes it possible to determine the weaknesses and strengths of students, identify learning problems, and provide appropriate recommendations (Jiang et al., 2023).
There is also some evidence of the use of the RF model to classify stress based on the personal characteristics of employees with an accuracy of up to 81% (Kim et al., 2022).However, despite the significantly greater predictive power of the RF compared to the single decision tree, researchers faced the problem of the complexity of interpreting the constructed models.This significantly limits their use in PE research.

Study Participants
The presented study analyzed the results of a survey of 1115 students from different regions of Ukraine for 2022-2023.Among the respondents, 42.8% were male students, and 57.2% were female students; 40.2% indicated a negative experience of being in the epicenter of hostilities.The average age of the respondents was 20.0±3.9 years.During the hostilities, higher education students experienced from 0 to 9 stressful events, the median indicator was 2 (1; 3) cases.The research was carried out in accordance with bioethical requirements.

Study Organization
The PA of students was measured according to the short International Questionnaire on PA (IPAQ, 2016), in points; the stress was measured according to the questionnaire of Shcherbatykh (Shcherbatykh, 2002), in points; the reactive anxiety was measured according to the Spielberger-Hanin test (STAI), in points; the signs of PTSD were measured according to the Mississippi scale for the assessment of posttraumatic reactions (civilian version) (Sloan et al., 1995), in points.The data set consisted of 1115 observations.Each data observation had 16 attributes.Among them, the categorical ones are gender, direction of education, negative experience, and strengthening of unhealthy habits.The rest includes the number of stressful events, PA, sleep, mood, etc., which are quantitative attributes.
The description of the input data is given in the table (Table 1).
Dichotomous variables were used as dependent signs that characterized the existing manifestations of stress-related conditions, in which stress (stress score exceeded 12 points), increased anxiety (anxiety score exceeded 45 points), and signs of PTSD (PTSD score exceeded 100 points) were assumed as 1, and their absence -as 0, for particular models.In total, 63.9% of respondents had stress, 41.5% had increased anxiety, and 23.0% had the symptoms of PTSD.

Statistical Analysis
The ensemble method of machine learning -the RF method -was chosen as the key research method.This method is a powerful tool for classification and regression modeling and is a machine learning algorithm aimed at reducing variance and overtraining, as well as evaluating the importance of variables.The RF method was used to predict the stress-related condition of students, namely increased stress, anxiety, and the presence of PTSD symptoms.
The implementation of the RF method involves the generation of a set of random subsets from the data set, for each of which a decision tree is constructed.In the process Byshevets, N., Andrieieva, O., Dutchak, M., Shinkaruk, O., Dmytriv, R., Zakharina, Ie., Kostiantyn Serhiienko, K., & Hres, M. (2024).The Influence of Physical Activity on Stress-associated Conditions in Higher Education Students of constructing RF trees, a non-linear approach is used to identify interconnections between attributes, which allows identifying a wider range of attributes (Nachouki et al., 2023).Each decision tree makes a prediction for new data, and the final prediction in our case is determined by the majority of votes (Fig. 1).This setting of parameters allows for simplifying the models and prevents them from being overtrained.
Model training.To train the model, there was randomly selected 50% of the data from the proposed data set.The construction of each decision tree involved the random selection of 5 attributes from 16 independent variables.Assessment of accuracy.The accuracy of the model was checked on the basis of 30% of the observed data that did not participate in the training.
In the course of assessing the accuracy of the models, there were constructed 2×2 classification matrices.On the main diagonal of each matrix there are the results of calculations of correctly classified students: true positive (TP) and true negative (TN), where TP can be considered as an "alarm signal", that is, the classifier detects a student with a stress-related condition.The remaining 2 elements are the number of incorrectly classified cases -false positive (FP) and false negative (FN).The accuracy of models can be Model parameters.The construction of models was carried out by sequentially adding decision trees.Each RF consisted of 100 decision trees.The number of random indicators (predictors) is 5.The depth of each tree in a node did not exceed 10 levels, and each node could be split into two child nodes only if at least 5 observations were concentrated in it (according to the previous split, at least 5 higher education students were placed in it).The leaves of the tree are nodes that do not have child nodes and make decisions about the class that the object belongs to (Fig. 2).(Xu & Yin, 2021).As a result, the following metrics were used to assess the quality of the models: (1) (2) To implement the tasks, the STATISTICA program (StatSoft, USA) was used, which made it possible to automate the process of constructing a RF.

Results
As shown by the results of previous research (Andrieieva et al., 2023;Byshevets et al., 2023), the level of stress-related conditions of students is determined by numerous factors, including controlled and uncontrolled ones.Some of them increase the risk of stress-related conditions of students, while others allow them to counteract the negative consequences of stress factors.Among the uncontrollable factors, it is possible to point out the gender of students, since, according to literature sources, for female students, it is more difficult to experience the impact of stress factors compared to male students.In addition, another uncontrolled harmful factor can be a negative experience, which comes from being in the epicenter of hostilities; and a positive one is the direction of education, where we assumed that higher education students in the specialty Physical culture and sport are more engaged in PA.Among the controlled factors increasing stress-related conditions of students, there can be addictive behavior, sedentary lifestyle, etc., and the factors that contribute to counteraction to stress include PA classes, adaptive ways of behavior, etc.
We constructed three models aimed at predicting stressrelated conditions of students.Figure 3 shows the process of constructing a RF for predicting the stress of students, where the horizontal axis represents the number of trees, and the vertical axis represents the proportion of classification errors for the training and test samples.The research showed that despite the construction of 100 trees, the process relatively stabilized after 30 trees.The proportion of classification errors ranges from 0.19 to 0.33 (Fig. 3).
The program allows viewing individual decision trees in a RF (Fig. 4).The research made it possible to define the most significant indicators for the prediction of stress in students.The main indicator was their PA.Such indicators as PA with high energy consumption; PA with moderate energy consumption, and the number of stressful factors topped the ranked list of significant features characterizing students.Meanwhile, despite our assumptions, gender, negative experience and the direction of education are at the end of the rating of significant factors for the prediction of stress of students (Fig. 5).Similarly, we constructed and analyzed a prediction model for increased anxiety and PTSD symptoms.
The Risk Estimate calculations showed how likely it is that the proposed models will give an incorrect prediction.Since the Risk Estimate in all cases turned out to be smaller for the training sample than for the test one, it can be asserted that the constructed models passed cross-checking (Table 2).
The data on classification matrices and model quality assessments is presented in the table (Table 3).
The stress detection model turned out to be the best among the constructed models.It shows good results according to all metrics: with its help, in 77% of cases, it is possible to predict the presence or absence of stress in higher education students.The model correctly identifies 86% of cases of the stress of students, and in 79% of cases, it identifies the stress of a student when it is actually present.The F1 score indicates the balance of the model.
As for the "Anxiety Increase" model, despite its average accuracy of 56%, its completeness of 7% is low, which indicates a reduced ability of the model to determine a student with increased anxiety, which means that in 93% of cases, the students experiencing anxiety are classified as those who feel no anxiety.In contrast, in 77% of the cases when the model identifies increased anxiety in students, they actually experience increased anxiety, and a low F1 Score indicates that the model is not sufficiently balanced.
The "PTSD Risk" model correctly predicts 78% of all cases.However, the model is not good enough at identifying students with PTSD symptoms.That is, the model correctly identifies students with the signs of PTSD only in 23% of cas-  es, and in the remaining cases, it identifies them as having no PTSD signs.In 64% of all the cases when the model identifies the students as those having PTSD symptoms, the students are actually characterized by the specified symptoms.The F1 score of 0.34 (or 34%) indicates that the model is sufficiently balanced, but it still can be much improved.The analysis of indicators significant for the prediction allowed us to make sure that negative experience, gender, and the direction of education in the models of classification of higher education students, taking into account their PA, are the least important variables.At the same time, the general assessment of PA and PA with high energy consumption turned out to be among the first and most important variables for establishing the fact that a student has a stress-related condition (Table 4).
It is worth emphasizing that the program allows saving the PMML deployment codes for the created models for further use of these codes through the Rapid Deployment Engine module for the purpose of predicting the stressrelated condition of higher education students who did not participate in previous research.In fact, we are talking about developing a prediction for new observations based on the proposed models.

Discussion
The research proved that the scientific community is actively searching for innovative methods and approaches to improving the process of PE of students.The authors turn to progressive ideas, i.e. the application of data mining to identify patterns and predict objects and phenomena in the field of PE (Andrieieva et al., 2022;Jinpeng & Le, 2022;Li & Wang, 2022).Scientists convincingly prove that models constructed on the basis of regression or classification algorithms, which learn from regularities in accumulated data, are more productive and accurate for constructing predictions in the field of PE (Briand et al., 2022;Jiang et al., 2023).Among such algorithms, the RF method is gaining more and more recognition.This method has several advantages compared to a single tree (Documentation for TIBCO Statistica, 2024;Xu & Yin, 2021;Schonlau & Zou, 2020).The analysis indicates that predictive models constructed by the RF method are used to predict the academic achievements of students in the field of PE.This approach to assessment is not only more objective than the traditional one but also provides an opportunity to better understand the significant factors influencing educational achievements in the discipline and to gain a deeper understanding of the patterns of educational activity, allows timely corrections to be made in the process based on the predicted educational effect (Jiang et al., 2023;Li &Wang, 2022;Nachouki et al., 2023;Palczewska et al., 2013).It is worth pointing out that today there is some evidence of higher performance of models implemented by the RF method compared to other approaches (Li &Wang, 2022).However, the complexity of interpreting the results of applying the RF method often prevents its wide use in applied research on PE.
The RF method generating a prediction by combining the predictions of a group of decision trees, which allows for significantly increasing its accuracy compared to other methods based on the construction of a decision tree, can be a modern powerful tool for predicting the stress-related condition of students (Li &Wang, 2022;Ding et al., 2022).
It was found that the stress-related conditions of students are most influenced by the PA of students, including PA with high and moderate energy consumption and the number of stressful events they have encountered.These variables were the most significant for predicting stress, increased anxiety, and PTSD symptoms in students.
The fact that gender did not turn out to be a significant factor affecting the stress-related condition of the category of youth under research was an interesting and unexpected result for us.This contradicts not only the data given in scientific sources (Pavlova et al., 2022;Rogowska &Pavlova, 2023;Levin et al., 2022;Petrachkov et al., 2023) but also our previous research where we found that female students reported symptoms of stress-related conditions more often than male students (Andrieieva et al., 2023;Byshevets et al., 2023).It is important to note that in previous research, either a separate comparative analysis of indicators of stress-related conditions depending on gender was carried out, or there was used the data obtained by respondents' reporting on the regularity of their activity (No/Rather not/Rather yes/Yes), but the results of measuring the level of their PA according to the IPAQ were not taken into account.The construction of logistic models does not allow taking into account gender or other categorical variables.In addition, the data sets were significantly smaller.Thus, we have every reason to believe that gender is not the only or the main factor affecting stress-related conditions in students.There are other variables that may moderate or mediate the connection between gender and stress.One of these variables is the level of PA which can mitigate the effects of stress factors, as indicated by the results of numerous scientific research, including ours (Andrieieva et al., 2022(Andrieieva et al., ,2023;;Byshevets et al., 2023;Kashuba et al., 2021;Steinacker et al., 2023).Therefore, due to the equally positive impact on stress regardless of gender, physical activity can be a factor that alleviates differences in the reaction to stress factors between students of different genders.Perhaps this is why, when we added quantitative indicators of PA to our model, gender ceased to be a significant factor for predicting stress-related conditions of students.This can mean the following: • PA is a more significant factor influencing the formation and development of a stress-related condition in students than gender; • gender indirectly affects the stress-related condition of students through PA, i.e., PA is actually a mediator for establishing the mechanism of gender influence on the stress-related condition of students; • the connection between gender and the stress-related condition of students depends on the moderatoranother variable, for example, the direction of education or the negative experience of being in the epicenter of hostilities, which interact with each other and have different effects on their stress-related condition.

Conclusions
Against the background of the continuation of the armed conflict in Ukraine, cases of stress-related conditions are becoming more and more frequent among students.An important task in the field of PE and sports is the prompt identification of students characterized by a stress-related condition.This will make it possible to offer them appropriate measures based on the use of means of PA to mitigate the negative effects of stress factors of increased potential and to prevent the development of stress-related disorders.
It was established that PA is the most important feature that allows the prediction of stress-related conditions in students.In addition, indicators of PA with moderate energy consumption, PA with high energy consumption, as well as the number of stressful events experienced by students are important for predicting stress in students.We constructed the "Stress", "Increased anxiety" and "Risk of PTSD" models, aimed at predicting stress-related conditions of students, using the RF method.The models for predicting increased anxiety and identifying students with PTSD symptoms require further improvement.

Fig. 1 .
Fig. 1.Application scheme of the RF algorithmThis makes the model more resistant to overtraining and improves its generalization ability.Model parameters.The construction of models was carried out by sequentially adding decision trees.Each RF consisted of 100 decision trees.The number of random indicators (predictors) is 5.The depth of each tree in a node did not exceed 10 levels, and each node could be split into two child nodes only if at least 5 observations were concentrated in it (according to the previous split, at least 5 higher education students were placed in it).The leaves of the tree are nodes that do not have child nodes and make decisions about the class that the object belongs to (Fig.2).

Fig. 2 .
Fig. 2. Decision tree structure diagram in a random forest

Fig. 4 .Fig. 5 .
Fig. 4.An example of individual decision trees for predicting stress of higher education students

Table 1 .
Description of input data

Table 2 .
Risk estimates for the constructed models

Table 3 .
Matrices of error classification and accuracy assessment of the built models (based on the test sample)

Table 4 .
Analysis of the significance of indicators of the stress-related condition of higher education students