Design optimization of university ideological and political education system based on deep learning

Deep learning algorithm and model design
The convolution layer in the convolutional neural network (CNN) can use several convolution kernels with different sizes to convolution data such as text vectors and capture local features with different granularity (as shown in Fig. 1). In the IPE scene, whether it is the micro-expression analysis of students or the processing of text content, the extraction of local features is very important18. For example, we can judge students’ cognitive degree and attitude tendency to a specific topic from their local text characteristics such as words and sentence structure in the discussion, and CNN can accomplish this task efficiently19. For data involving spatial dimension, such as the identification of students’ participation in class, the convolution layer of CNN can handle the spatial information well, and accurately identify whether students are in different classroom participation states, such as concentration, confusion and enthusiasm, thus providing a basis for the subsequent targeted teaching strategy formulation20,21.

Typical CNN network structure.
In the process of IPE, students’ learning behavior and cognitive development are sequential and time-dynamic. Long Short-Term Memory (LSTM) is particularly good at modeling this kind of serial data, which can capture the dynamic process of students’ knowledge mastery and the long-term law of cognitive development through memory units22. For example, analyze students’ academic performance, online learning duration, homework completion sequence and other serial data in different time periods to predict students’ future learning needs and possible difficulties. When dealing with students’ long-term learning data, LSTM can remember the key information of the earlier stage and combine it with the follow-up information, which is very important for understanding the long-term process such as the evolution of students’ ideas and the gradual formation of the value system, and is helpful to realize more accurate educational intervention and guidance23,24.
Combining CNN’s spatial feature extraction ability with LSTM’s time series processing ability, we can realize the spatio-temporal feature fusion of ideological and political education data25,26. This integration mechanism enables the model to understand students’ learning status and needs more comprehensively and deeply, such as taking into account students’ real-time expressions (spatial characteristics) in class and their performance trends (time characteristics) at different learning stages, so as to dynamically generate personalized teaching programs and realize the transformation from “standardized supply” to “precise drip irrigation”. CNN focuses on local and spatial features, while LSTM is responsible for sequence and temporal features, which complement each other and overcome the limitations of a single model in dealing with complex educational data27. This complementarity makes the model perform better in tasks such as analyzing students’ learning behavior and predicting their learning needs. Compared with CNN or LSTM model alone, CNN-LSTM hybrid model can more accurately capture the inherent complex relationships and laws of data and provide more accurate and targeted support for the ideological and political education system.
In contrast, other deep learning models, such as the traditional feedforward neural network (FNN), lack the ability to deal with sequence data and local features effectively; Although the simple RNN (recurrent neural network) can process sequence data, it has shortcomings in the long-term dependence on capture and gradient propagation; Although transformer architecture performs well in processing long sequence data and capturing global dependencies, its computational complexity is high, and it may not be as efficient and targeted as cnn-lstm model in processing some IPE scenarios with obvious temporal and spatial characteristics and not a particularly large amount of data28. Therefore, considering the characteristics of IPE data and practical application requirements, it is reasonable and effective to choose CNN and LSTM to build a hybrid model to optimize the University IPE system.
This investigation seeks to enhance the structure of university IPE systems by employing deep learning technology. To realize this objective, the research introduces a hybrid model that synergizes CNN with LSTM, capitalizing on CNN’s prowess in feature extraction and LSTM’s competence in managing sequential data29,30. Figure 2 provides an illustration of this concept.

The process of using CNN-LSTM mixed model to process IPE data.
Using CNN-LSTM hybrid model to process IPE data. Firstly, preprocess the collected IPE related data. This includes text cleaning, word segmentation, stop words removal and other steps to ensure the quality and consistency of data. In addition, word embedding technology Word2Vec is used to convert text data into vector representation, which is convenient for the input of deep learning model31,32.
Word2Vec is a language model, which is used to learn semantic knowledge from a large number of text corpus in an unsupervised way. Its main purpose is to transform words in natural language into mathematical vectors so that computers can understand and process them. Word2Vec can transform words into vectors to represent them, so that the relationship between words can be quantitatively measured and the relationship between words can be mined. Word vectors generated by Word2Vec can capture the semantic information of words, such as words with similar meanings have similar vectors.
There are two main ways to realize Word2Vec model: CBOW (Continuous Bag-of-Words) and Skip-gram. The training of CBOW model can be carried out by using open source libraries such as Gensim. Predict the head word through the context word33. Skip-gram model predicts the contextual words around it through the head word, see in Fig. 3.

Word2Vec language model structure.
Word2Vec is used to convert the text into a vector, and the original text is preprocessed, including word segmentation, stop words removal, stemming and other operations. Collect all the words in the preprocessed text to build a vocabulary. Each word in the vocabulary is assigned a unique integer identifier. The preprocessed text is converted into training data, in which each training sample consists of a central word and its surrounding context words. Use training data to train Word2Vec model. After the training is completed, the corresponding word vectors are obtained by querying the words in the vocabulary. Word vectors generated by Word2Vec are widely used in natural language processing, including word similarity calculation, text classification, part-of-speech tagging, named entity recognition, machine translation, text generation and so on. When using CNN-LSTM hybrid model to process IPE data, it is an important preprocessing step to convert text data into vector representation through Word2Vec, which can help deep learning model to better understand and process text data34.
The input layer receives the preprocessed text vector as input. The convolution layer uses several convolution kernels with different sizes to perform convolution operations on text vectors to capture local features with different granularity (Fig. 4).

Convolution transformation process.
The mathematical expression of convolution operation is:
$$y=j\left=f\left( x \right. \right)$$
(1)
Where, \({c_i}\) is the feature map after convolution operation, W is the weight matrix of convolution kernel, \({x_{i:i+h – 1}}\) is the subsequence from position i to \(i+h – 1\) in the input text vector, b is the offset term, and f is the activation function ReLU.
The pooling layer performs maximum pooling operation on the feature map output by the convolution layer to extract the most important features and reduce the dimension35. The mathematical expression of maximum pooling is:
$${p_i}=\hbox{max} \left( {{c_{2i – 1}},{c_{2i}}} \right)$$
(2)
Where \({p_i}\) is the output after pooling operation and\({c_{2i – 1}},{c_{2i}}\)is the output of the adjacent convolution layer.
The LSTM layer receives the output of the pooling layer as the input of the sequence, and captures the long-term dependencies in the sequence through the LSTM unit (Figure 5).

The mathematical model of LSTM unit involves multiple gating mechanisms and cell state update, and the specific formula is as follows:
$${i_t}=\sigma \left( {{W_{xi}}{x_t}+{W_{hi}}{h_{t – 1}}+{b_i}} \right)$$
(3)
$${f_t}=\sigma \left( {{W_{xf}}{x_t}+{W_{hf}}{h_{t – 1}}+{b_f}} \right)$$
(4)
$${o_t}=\sigma \left( {{W_{xo}}{x_t}+{W_{ho}}{h_{t – 1}}+{b_o}} \right)$$
(5)
$${g_t}=\tanh \left( {{W_{xc}}{x_t}+{W_{hc}}{h_{t – 1}}+{b_c}} \right)$$
(6)
$${c_t}={f_t} \otimes {c_{t – 1}}+{i_t} \otimes {g_t}$$
(7)
$${h_t}={o_t} \otimes \tanh \left( {{c_t}} \right)$$
(8)
Where \({x_t}\) is the current input,\({h_{t – 1}},{c_{t – 1}}\)is the hidden state and cell state at the last moment, \(W,b\) is the weight matrix and bias term, σ is the sigmoid activation function, and ⨂ is the element-by-element multiplication.
The output layer uses softmax function to classify the output of LSTM layer. The mathematical expression of the softmax function is:
$$P\left( {y=j\left| x \right.} \right)=\frac{{{e^{{x_j}}}}}{{\sum\nolimits_{{k=1}}^{K} {{e^{{x_k}}}} }}$$
(9)
Where x is the output vector of the LSTM layer, j is the index of category labels, and K is the total number of categories.
Back propagation algorithm and gradient descent optimizer are used to train the model. Adjusting model parameters by minimizing cross entropy loss function;
$$L= – \sum\limits_{{i=1}}^{N} {{y_i}\log \left( {{p_i}} \right)}$$
(10)
Where N is the number of samples, \({y_i}\) is the one-hot coding vector of real tags, and \({p_i}\) is the probability distribution vector of model prediction. In the training process, early stop and regularization techniques are also used to prevent over-fitting, and the learning rate attenuation strategy is used to optimize the training process.
Optimal design process of IPE system
This study aims to optimize the design of university IPE system through deep learning technology. In order to achieve this goal, a mixed model combining CNN and LSTM is adopted. The architecture design of university IPE system is shown in Fig. 6.

Architecture of university IPE system.
First of all, university IPE-related data are collected from multiple channels, including lecture notes, students’ homework, discussion forums, online test records, etc. These data cover all kinds of performances of students in the process of IPE, which is very important for establishing an accurate deep learning model.
The collected raw data needs a series of preprocessing operations to adapt to the input requirements of deep learning model. Pretreatment steps include text cleaning (removing irrelevant characters, punctuation marks, etc.), word segmentation (dividing the text into independent words or phrases), stop words removal (deleting common words that have little meaning to the text), stem extraction or word shape reduction (converting words into their basic forms), etc. In addition, the text data is transformed into vector representation by word embedding technology, so that the model can be understood and processed.
After preprocessing the data, a CNN-LSTM hybrid model is constructed. This model combines the advantages of CNN in feature extraction and the ability of LSTM in processing sequence data. Specifically, CNN layer is responsible for extracting local features from text data, while LSTM layer can capture long-term dependencies in text sequences. Through this hybrid architecture, the model can understand text data more comprehensively and extract information that is instructive to IPE.
The subsequent step involves employing the pre-processed data for model training. During this phase, both the backpropagation algorithm and the gradient descent optimizer are utilized to fine-tune the model’s parameters, thereby minimizing prediction errors. Concurrently, measures to counteract overfitting and augment the model’s generalization capabilities are examined, including the early stop technique (which halts training once validation error starts to increase) and regularization techniques (such as L1 and L2 regularization). Moreover, a learning rate decay strategy is implemented to dynamically adjust the learning rate, accelerating the model’s convergence and enhancing the training outcome.
After the training, the deep learning model is integrated into the existing IPE system. The prediction results of the model are provided as auxiliary information to teachers and education administrators to help them understand students’ learning situation and needs more comprehensively. For example, the model can predict students’ learning performance in an ideological and political course, thus providing personalized teaching suggestions for teachers; At the same time, the model can also analyze students’ online communication data and find potential problems and puzzles, so that education administrators can intervene and provide help in time. In this way, the deep learning model can effectively improve the intelligent level of IPE system and provide students with more accurate and personalized learning support.
Experimental setup
Data set
This study uses a multi-channel data set related to IPE from universities, including lecture notes, students’ homework, exchange records in discussion forums and online test scores. The data set is about 5GB in size and contains millions of records, covering thousands of students’ learning behaviors, interactions and test scores. This data set shows a high degree of diversity in many dimensions, such as the diversity of data sources, the diversity of students’ backgrounds and the diversity of learning behaviors.
Clean the text, remove irrelevant characters and punctuation marks, and keep the useful text information for model training; Use Chinese word segmentation tools to cut the text into independent words or phrases, and delete stop words to reduce noise data; Convert words into basic forms and express them uniformly through stem extraction or morphological reduction; Word2Vec technology is used to convert the processed text data into vector representation.
Hyperparametric optimization of CNN-LSTM model
In order to optimize the performance of CNN-LSTM hybrid model, the method of combining grid search and random search is used to optimize the superparameter. Grid search is an exhaustive search method, which traverses all hyperparametric combinations within a specified range and finds the optimal solution. Random search is to randomly select the combination of superparameters within a specified range for experiments. Although the optimal solution may be missed, the calculation cost is low and it is suitable for preliminary screening in a large range.
Adjust several key hyperparameters to optimize performance, including learning rate (0.0001 to 0.1), batch size (32 to 256), convolution kernel size (3 to 7), convolution layer number (1 to 3), LSTM layer number (1 to 2) and regularization intensity (0 to 0.1). After many experiments, the final superparameter values are: learning rate 0.001, batch size 128, convolution kernel size 5, convolution layer number 2, LSTM layer number 1 and regularization intensity 0.01.
Using 50% cross validation) protocol. The specific steps are as follows:
-
(1)
Divide the data set into five equal parts at random.
-
(2)
Each time, four copies are selected as the training set, and the remaining one is used as the verification set.
-
(3)
The model is trained on each training set, and the performance is evaluated on the corresponding verification set.
-
(4)
Repeat the above steps five times, and choose a different verification set each time.
-
(5)
Calculate the average performance of five verifications as the final performance index of the model.
Data privacy and ethical issues
All student data are anonymized during the collection stage to prevent direct identification of individuals. Only collect the necessary data directly related to the research to avoid obtaining unnecessary personal information. In the process of data transmission and storage, encryption technology is used to protect its security. Before data collection, students are clearly informed of the purpose of collection and the way of use, and their consent is obtained. Through strict access control, it is ensured that only authorized researchers can access these data, further ensuring students’ privacy and data security.
link