20 March 2024
Volume 59 Issue 3
Heterogeneous network representation learning based on metapath attribute fusion
WANG Jinghong, WU Zhibing, HUANG Peng, YANG Jiateng, LI Bi
2024, 59(3):  1-13.  doi:10.6040/j.issn.1671-9352.7.2023.787
Abstract ( 45 )   PDF (5573KB) ( 22 )   Save
References | Related Articles | Metrics
Focusing on the research on representation learning of information networks, a metapath attribute fusion graph neural network(MAFGNN)based on metapath information fusion is proposed, which is to integrate the neighbor information of the target node, including the metapath information, into the node before introducing the metapath in the heterogeneous network to achieve the fusion of target node and neighbor information. This method first converts the attribute features of different types of nodes into dimensions to facilitate subsequent fusion operations. The fusion operation of target node information is completed by calculating the weight values of target nodes and neighbor nodes. Then target nodes are fused according to specific metapaths, and finally different semantic information is fused between different metapaths. Experiments on multiple heterogeneous information datasets show that the MAFGNN model has the best performance and more accurate prediction results than the most advanced benchmark experiments in dealing with heterogeneous network node embedding.
Heuristic construction method of fuzzy concept set and its recommended application
LIU Zhonghui, JIANG Shuai, MIN Fan
2024, 59(3):  14-26.  doi:10.6040/j.issn.1671-9352.7.2023.9950
Abstract ( 26 )   PDF (1014KB) ( 18 )   Save
References | Related Articles | Metrics
Aiming at the problem that fuzzy formal concept analysis is difficult to apply to large-scale datasets in recommendation applications, a recommendation method based on a heuristic construction of fuzzy concept set is proposed. Sub-contexts are constructed for each user based on the similarity between users. Then, new heuristic information is used on the sub-contexts to generate fuzzy concepts with users and items as clues, respectively. Finally, using the internal information of fuzzy concepts, a recommendation confidence integrated with user weights is designed to achieve personalized recommendations for users. The experimental results on six real datasets show that the proposed method has higher recommendation efficiency, and can achieve better recommendation results on sparse data sets compared with classical collaborative filtering algorithms.
Fuzzy border-peeling clustering
SUN Jiarui, DU Mingjing
2024, 59(3):  27-36.  doi:10.6040/j.issn.1671-9352.4.2023.040
Abstract ( 33 )   PDF (4176KB) ( 9 )   Save
References | Related Articles | Metrics
A fuzzy border-peeling clustering(FBP)algorithm is proposed. First, a density estimation method based on Cauchy kernel is used to calculate the densities of data points. Secondly, the boundary data are separated from the core data using the layer-by-layer peeling strategy. Thirdly, the reachability between the core data is used to achieve the core region clustering. Finally, a fuzzy assignment strategy is used to achieve the soft partitioning of the boundary data. A comparison is made between the fuzzy border-peeling clustering and 10 benchmark algorithms, including 6 density-based clustering algorithms and 4 fuzzy clustering algorithms, on artificial and real-world datasets. The experimental results show that on all datasets, FBP has the ARI(adjusted rand index)increased by 21% to 60% on average, and FBP has the NMI(normalized mutual information)increased by 12% to 47% on average. The border-peeling clustering algorithm optimized based on Cauchy kernel and fuzzy assignment strategy significantly improves the accuracy of clustering.
Optimization of hydrogeological parameters based on improved butterfly optimization algorithm
WEI Xiuxi, PENG Maosong, HUANG Huajuan
2024, 59(3):  37-50.  doi:10.6040/j.issn.1671-9352.7.2023.3667
Abstract ( 29 )   PDF (5557KB) ( 14 )   Save
References | Related Articles | Metrics
In order to solve the problems of insufficient accuracy of hydrogeological parameters and low efficiency of traditional routing methods, an optimization strategy of hydrogeological parameters based on golden sine weighted butterfly optimization algorithm(GSWBOA)is proposed. Firstly, the golden sine operator is introduced in the global and local search phase of butterfly optimization algorithm to reduce the solution space of the algorithm. Secondly, adaptive weights are introduced to adjust the individual moving step size and search direction in the later stage of the algorithm. The comparison test results of 6 benchmark test functions show that the GSWBOA has higher optimization accuracy and faster convergence. The optimization strategy is applied to the optimization of hydrogeological parameters water conductivity coefficient and water storage coefficient to achieve the minimum depth reduction error, and the optimization strategy is compared with particle swarm optimization algorithm, wiring method and other optimization strategies. The results show that the golden sinusoidal weighted butterfly optimization algorithm can effectively optimize the hydrogeological parameters, improve the calculation performance of Theis formula, and obtain a smaller drawdown error, which provides a new method for the subsequent pumping test.
Grey wolf optimization algorithm based on multi-strategy combination and its application
QIN Hongwu, WANG Lizheng, FU Yu, SUI Muxuan, HE Binggao
2024, 59(3):  51-60.  doi:10.6040/j.issn.1671-9352.7.2023.4633
Abstract ( 27 )   PDF (4919KB) ( 6 )   Save
References | Related Articles | Metrics
The standard grey wolf optimizer(GWO)algorithm has issues such as difficulty balancing local exploration and global development. A multi-strategy grey wolf optimization algorithm(MSGWO), based on the fusion of various strategies, is presented to address such problems. First, the grey wolf algorithm introduces the Tent map and a nonlinear convergence factor. Then, to coordinate attempts in the GWO optimization process, the paper applies three learning strategies: extensive learning, elite learning, and coordinated learning. Finally, the paper uses roulette wheel for strategy selection to obtain more diverse wolf positions and globally representative individuals and utilizes benchmark function testing to compare algorithm variations. The outcomes demonstrate that the MSGWO algorithm has a faster convergence speed and a good balance between local development and global search. Based on this, the echo state networks(ESN)hyperparameter for regression prediction is optimized using the MSGWO method. The experiment demonstrates that the MSGWO algorithm performs optimally with an average absolute percentage error of 0.38 percent and a fitting degree of 0.98.
Hierarchical feature selection algorithm based on instance correlations
SHI Chunyu, MAO Yu, LIU Haoyang, LIN Yaojin
2024, 59(3):  61-70.  doi:10.6040/j.issn.1671-9352.7.2023.1073
Abstract ( 26 )   PDF (851KB) ( 15 )   Save
References | Related Articles | Metrics
A hierarchical feature selection algorithm based on instance correlations(HFSIC)is proposed to further improve the performance of the hierarchical feature selection algorithm. After using sparse regularization items to remove irrelevant features, the parent-child relationship in the hierarchical structure with the reconstruction relationship between samples in the feature space are combined. The correlation of samples of each category under the same subtree are learned. Recursive regularization to optimize the output features weight matrix is used. When measuring the sample correlation, the reconstructed coefficient matrix is integrated into the training model, and the norm is used to remove irrelevant and redundant features. The optimization problem of the proposed model is solved using the accelerated proximal gradient method, and the superiority of the proposed algorithm is evaluated under multiple evaluation metrics. The experimental results show that the proposed method outperforms the other algorithms on five datasets. The test verifies the effectiveness of the proposed algorithm.
Entity disambiguation method based on graph attention networks
NIU Zequn, LI Xiaoge, QIANG Chengyu, HAN Wei, YAO Yi, LIU Yang
2024, 59(3):  71-80.  doi:10.6040/j.issn.1671-9352.1.2022.4484
Abstract ( 24 )   PDF (3779KB) ( 9 )   Save
References | Related Articles | Metrics
We propose an entity disambiguation method based on graph attention networks for semi-structured knowledge base data. First, a global knowledge graph is constructed from the semi-structured knowledge base, and the entity reference items are embedded by Bert pre-trained model meanwhile. Next, graph attention networks which leverages masked self-attention layers is applyed on candidate entity nodes of global knowledge graph to fetch a vector of node level. Furtherly, we com pute similarity scores rank between the entity reference items and the candidate entity to complete the task of entity disambiguation. The experimental results on CCKS2019 dataset achieve state-of-the-art.
Emoji embedded representation based on emotion distribution
ZENG Xueqiang, SUN Yu, LIU Ye, WAN Zhongying, ZUO Jiali, WANG Mingwen
2024, 59(3):  81-94.  doi:10.6040/j.issn.1671-9352.1.2022.3548
Abstract ( 28 )   PDF (6234KB) ( 10 )   Save
References | Related Articles | Metrics
This paper proposes an emoji embedded representation based on emotion distribution(EDEER)method. The EDEER method adopts the soft label of BERT-based emotion prediction model to learn emoji embedded representation from real data, and directly models the expression degree of emoji on various sentiments through emotion distribution, so that the embedded representation contains various emotional information of emoji. Multiple sets of comparative experiments on the Chinese Weibo dataset containing emoji shows that the method proposed in this paper can effectively learn emoji embedded representations that are directly related to fine-grained sentiments, and build an emoji representation space with high emotional expression quality.
Identification and statistical analysis methods of personal information disclosure in open government data
CHEN Haisu, LIAO Jiachun, YAO Sicheng
2024, 59(3):  95-106.  doi:10.6040/j.issn.1671-9352.7.2023.2681
Abstract ( 26 )   PDF (1538KB) ( 7 )   Save
References | Related Articles | Metrics
To promote the protection of personal information during data opening, an in-depth analysis of the current status of disclosure of personal information in the open government data is conducted. Firstly, the paper obtains the datasets from relevant platforms and pre-process to classify the datasets that containing personal information based on features such as field and table names, etc. Then, methods of sensitive information identification are applied to identify and extract various types of personal information in the data, and map the information back to individuals to summarise the total number of individuals and detect their associated data. Through data visualizations, the current status of personal information disclosure could be examined. Although some open government data platforms may have implemented certain measures such as data categorization and de-identification, the published open datasets still contain a large amount of personal information, which is required to be improved in terms of data categorization and classification, sensitive information identification and data desensitization in a normative and accurate manner.
The ML-KNN method based on attribute weighting
WEN Xin, LI Deyu
2024, 59(3):  107-117.  doi:10.6040/j.issn.1671-9352.2.2023.027
Abstract ( 24 )   PDF (3383KB) ( 12 )   Save
References | Related Articles | Metrics
A ML-KNN method based on attribute weighting has been proposed. To be specific, we first identify samples from the non-positive regions of decision classes by means of the variable precision neighborhood rough set model with respect to each label and construct the heterogeneous sample pairs. Then, the significance of different attributes for classification is evaluated based on their discernibility for the heterogeneous sample pairs. Finally, the weighted distances between samples are calculated in order to obtain the nearest neighbor distributions of samples. At the same time, based on the principle of maximizing the posterior probability, the multi-label classification is implemented. Further, the experimental results on ten public multi-label data sets verify the effectiveness of the proposed method.
A new probabilistic hesitant fuzzy multi-attribute group decision making method based on improved distance measures
LIU Mengdi, ZHANG Xianyong, MO Zhiwen
2024, 59(3):  118-126.  doi:10.6040/j.issn.1671-9352.7.2023.4667
Abstract ( 22 )   PDF (2000KB) ( 13 )   Save
References | Related Articles | Metrics
Aiming at the multi-attribute group decision making problem with known attribute weights under probabilistic hesitant fuzzy environments, hesitation degrees of probabilistic hesitant fuzzy sets are considered, and thus a new method of probabilistic hesitant fuzzy multi-attribute group decision making is proposed based on improved distance measures. Firstly, combining the traditional probabilistic hesitant fuzzy distance measures, improved probabilistic hesitant fuzzy distance measures with hesitation degrees are defined through information fusion, including the Hamming distance, Euclidean distance, and generalized Euclidean distance. These new measures depend on combination coefficients to achieve the theoretical expansion and fusion optimization, and size relationships and parameter monotonicity of distance measures are studied. Secondly, according to the improved distance measures, a new method of multi-attribute group decision making is constructed by using the technique for order preference by similarity to ideal solution(TOPSIS)method, and an example of company location is used to make decision selection. The effectiveness of the proposed method is revealed by parameter analysis and decision comparison. Related researches systematically deepen probabilistic hesitant fuzzy distance measures, and effectively enrich multi-attribute group decision-making methods.