JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE)

A novel rough K-means clustering algorithm based on the weight of density

XIE Juan-ying1, 2, ZHANG Yan1, XIE Wei-xin2, 3, GAO Xin-bo2

J4. 2010, 45(7): 1-6.

Abstract ( 1464 )

PDF (368KB) ( 1577 )

Save

Related Articles | Metrics

A novel rough K-means clustering algorithm was presented based on the weight of exemplar density to overcome the drawback of selecting initial seeds randomly of available rough K-means algorithms. A new density function was defined for each sample according to the denseness of samples around it without any arbitrary parameter, and the top K samples with higher density and far away from each other were selected as initial centers of rough K-means clustering algorithm. Further more the new weight was defined for each exemplar according to the value of the new density function, so that the better could croids of each cluster could be calculated out without influenced by noisy data. Experiments on six UCI data sets and on synthetically geterated data sets with noise points proved that our algorithm got a better clustering result, and had a strong anti-interference performance for noise data.

The learning algorithm of a generative and discriminative combination classifier

JIANG Xue-lian, SHI Hong-bo*

J4. 2010, 45(7): 7-12.

Abstract ( 1533 )

PDF (772KB) ( 1628 )

Save

Related Articles | Metrics

Based on the AdaBoost ensemble framework, a learning algorithm of generative／discriminative combination classifier was proposed. In each round of the algorithm, a generative classifier and a discriminative classifier were learned, and the classifier with the smaller error rate was selected as the individual classifier, and then all the individual classifiers were combined by a weighted approach. Experiment results showed that this method was very good on accuracy and convergence speed.

Feature impprtance study on transfer learning of sentiment text classification

HUANG Xian-li，LUO Dong-mei

J4. 2010, 45(7): 13-17.

Abstract ( 1340 )

PDF (351KB) ( 1496 )

Save

Related Articles | Metrics

A mult-objective particle swarm optimization algorithm based on the chaotic mutation

PEI Sheng-yu，ZHOU Yong-quan

J4. 2010, 45(7): 18-23.

Abstract ( 1321 )

PDF (797KB) ( 1262 )

Save

Related Articles | Metrics

Parameter inversion procedure for a nonlinear constitutive model of conditioned soils

LI Shou-ju1,SHANGGUAN Zi-chang2,3,SUN Wei4,LUAN Mao-tian1,LIU Bo3

J4. 2010, 45(7): 24-27.

Abstract ( 1319 )

PDF (1024KB) ( 1681 )

Save

Related Articles | Metrics

An improved K-means algorithm by weighted distance based on maximum between-cluster variation

ZHANG Xue-feng1, LIU Peng1,2

J4. 2010, 45(7): 28-33.

Abstract ( 1674 )

PDF (1770KB) ( 1786 )

Save

Related Articles | Metrics

To find natural clusters, the criterion function was improved by being defined as the weighted sum of the squared error. The way each point being assigned to the centroid in the iteration of the K-means algorithm was also modified: each point was assigned to the centroid that had minimum weighted distance. The weight was related with the number of points in each cluster, and the parameter of weighted distance was optimized by maximizing the between-cluster variation. Experimental results showed that the improved K-means algorithm significantly enhanced the clustering quality by reducing the probability of larger cluster’s being broken.

A non-deterministic finite automata minimization method based on preorder relation

ZHANG Ming-ming, QIN Yong-bin

J4. 2010, 45(7): 34-38.

Abstract ( 1417 )

PDF (379KB) ( 1113 )

Save

Related Articles | Metrics

In order to reduce the number of non-deterministic finite automata (NFA) states, the preorder relation was introduced. The transition diagram for the NFA was regarded as a directed graph with signs based on graph theory,and then a new method of NFA minimization was proposed. Compared with the current NFA minimization algorithm based on merging the equivalent states, this method could further reduced the number of NFA states while accepting the same languages.

A multi-level clustering approach based on noun phrases for search results

PANG Guan-song, ZHANG Li-sha, JIANG Sheng-yi*, KUANG Li-min, WU Mei-ling

J4. 2010, 45(7): 39-44.

Abstract ( 1654 )

PDF (959KB) ( 1838 )

Save

Related Articles | Metrics

In order to get high qualitative clustering results, the noun phrases was selected as candidate cluster labels and generates basic clusters based on the distribution of candidate cluster labels. And then multi-level clustering was proceeded on basic clusters by using one pass clustering algorithm with linear time complexity. The comparative experiment was carried with our method, NEC algorithm, STC algorithm and Lingo algorithm, and the results showed that our method could get more informative, readable cluster labels and more effective than other three methods.

Improvement of cluster-based genetic segmentation of time series algorithm

WU Da-hua, HE Zhen-feng*

J4. 2010, 45(7): 45-49.

Abstract ( 1401 )

PDF (885KB) ( 1558 )

Save

Related Articles | Metrics

The fitness value function in the algorithm proposed by Vincent S. Tseng et al based on cluster-based genetic segmentation of time series with DWT is inadequate. Two points on the calculation of fitness value of each chromosome was proposed to improve this algorithm: data normalization was used to eliminate the influence of amplitude,and the interclass distance was introduced to make distance between classes distinct. Experiments were conducted to compare the former and improved algorithm,and the results showed that these two improvements improved the fitness value function accuracy which was more beneficial to identify sequence patterns.

A prediction algorithm for multi-data streams based on GEP

DING Chao1, 2, YUAN Chang-an1, 3, QIN Xiao1, 3

J4. 2010, 45(7): 50-54.

Abstract ( 1528 )

PDF (1011KB) ( 1679 )

Save

Related Articles | Metrics

A prediction algorithm for multi-data stream based on gene expression programming(GEP) was proposed for compensating the shortage that the traditional linear regression method could only adapt to a simple prediction model and predict AVG in the future window. A method by using sliding windows to partition data stream was given in the multi-data stream. The main conception of Multi-Streams was defined, and the map relation in it was revealed. An algorithm was given to pretreatment multi-data stream according the map relation and the sliding windows above. An adaptive forecasting model was put forward based on DSMA-GEP in the multi-data stream. The experience with the real data showed that the method was efficient.

A local linear emedding agorithm based on harmonicmean geodesic kernel

ZENG Weng-fu1, HUANG Tian-qiang1,2, LI Kai1, YU YANG-qiang1, GUO Gong-de1,2

J4. 2010, 45(7): 55-59.

Abstract ( 1811 )

PDF (1002KB) ( 1613 )

Save

Related Articles | Metrics

An improved algorithm was proposed to overcome the shortcomings of the existing local linear embedding algorithm that was not suitable for the nonuniform distribution data and not use the information of distant points. First, to improv the accuracy of the algorithm the geodesic-distance was introduced into the new algorithm in order to take advantage of the information of distant points, and then the harmonic-mean geodesic-kernel matrix was constructed by using the harmonic-mean standardization,which could process robustly non-uniform distribution data. The results of the experiments on UCI data sets showed that the improved algorithm could obtain better performance than the classical local linear embedding algorithm on dimension reduction.

A granular computing approach for knowledge hiding

QIU Tao-rong, WANG Lu, XIONG Shu-jie, BAI Xiao-ming

J4. 2010, 45(7): 60-64.

Abstract ( 1489 )

PDF (866KB) ( 1740 )

Save

Related Articles | Metrics

The protection of sensitive knowledge contained in the data is a key research issue in privacy preserving data mining. Granular computing has a powerful ability in using different levels of granulations during problem solving. By using granular computing model derived by rough set, a method for the knowledge hiding in the data was discussed. First, some concepts such as tolerance relations, tolerance information granules and the size of granules were introduced. Second, an approach for knowledge hiding was proposed. Final, the given example and testing in real datasets showed that the proposed method was effective.

Age estimation of facial images based on non-negative matrix factorization with sparseness constraints

DU Ji-xiang1,2, YU Qing1, ZHAI Chuan-ming1

J4. 2010, 45(7): 65-69.

Abstract ( 1372 )

PDF (972KB) ( 1681 )

Save

Related Articles | Metrics

Automatic age estimation based on facial images has been become an important orientation of the face recognition research. By applying sparseness constrains to base matrix or coefficient matrix in the factorization of non-negative matrix factorization, a new subspace could be formed with part-based representation ability to describe image data. And the radial basis function neural networks was used to extract the aging information contained in most facial image. The experimental results demonstrated that the non-negative matrix factorization with sparseness constraints algorithm could achieved a better performance for age estimation task.

A context based method for ROI detection in digitized mammograms

GUO Qiao-jin, DING Yi, LI Ning

J4. 2010, 45(7): 70-75.

Abstract ( 1515 )

PDF (538KB) ( 1520 )

Save

Related Articles | Metrics

In order to solve the problem that context information was often missed in ROI (region of interest)detection. probability latent semantic analysis(PLSA) was employed to extract the context information from surrounding regions,and the context information was used as context features to aid ROI detection. Experimental results showed that the context information can effectively improve the accuracy of ROI detection.

An algorithm for color image segmentation based on region growth

LIU Zhan-jie1, MA Ru-ning1, ZOU Guo-ping1, ZHONG Bao-jiang2, DING Jun-di 3

J4. 2010, 45(7): 76-80.

Abstract ( 1997 )

PDF (1233KB) ( 2776 )

Save

Related Articles | Metrics

A color image segmentation algorithm was proposed under the framework of seeded region growing (SRG).Compared to existing SRG methods, this method was robust to the selection of initial seedpixels and the order of region growing. Specifically, there were three main steps: 1) computing a neighborhood based similarity factor (NSF) based on the local color histogram of each pixel; 2) building the criteria of seed selection, region growth and region termination principled by NSF; 3) assigning labels to those un-segmented pixels for a final segmentation. Experimental results showed the superior performance of our methods in computational complexity and segmentation accuracy to the popular method of JSEG.

Image retrieval based on saliency weighted color and texture

ZOU Guo-ping1, MA Ru-ning1, DING Jun-di2, ZHONG Bao-jiang3

J4. 2010, 45(7): 81-85.

Abstract ( 1571 )

PDF (2157KB) ( 1813 )

Save

Related Articles | Metrics

The term `saliency' that indicates the visual importance of pixels in an image has a significant effect in contentbased image retrieval. A new image retrieval method was presented by exploring the saliency-weighted image features of color and texture,which did not require to segment those salient regions from images, but only to weigh the original features of color and texture by saliency of pixels. In such a way, the color and texture features of salient regions were naturally enhanced. Four popular saliency maps on images were mainly tested and the experimental results showed the effectiveness of our method with higher retrieval accuracy.

Multi-region image segmentation based on pulse coupled neural network

XU Guang-zhu1, LIU Ming2, REN Dong1, MA Yi-de3, LIU Xiao-li1

J4. 2010, 45(7): 86-93.

Abstract ( 1454 )

PDF (2490KB) ( 1427 )

Save

Related Articles | Metrics

In order to solve the problems that the traditional pulse coupled neural network (PCNN) refers only to binary segmentation and does not work well for bigger image regions with sluggish gray variation,a multi-region image segmentation method was proposed based on PCNN. First, the initial image was preprocessed by smoothing and normalizing and put into PCNN. Then, with the help of fast linking, neurons with similar states fired synchronously to finish single region segmentation in each iteration processing. After pre-configured iterations, the total firing times of each neuron were calculated as the pixel intensity of new input image,and then preprocessed by smoothing and normalizing again,and finally put into PCNN. The above processing was repeated to complete multi-region segmentation. Experimental results on Berkeley image database showed that the proposed method was efficient, robust and could be used to segment image effectively.

A review of fingerprint image segmentation methods

GUO Wen-juan, YANG Gong-ping*, DONG Jin-li

J4. 2010, 45(7): 94-101.

Abstract ( 2097 )

PDF (597KB) ( 2062 )

Save

Related Articles | Metrics

In an automatic fingerprint identification system, precise segmentation of fingerprint image contributes to speeding up subsequent processes as well as improving the recognition accuracy. Fingerprint segmentation methods can be roughly divided into three groups, including the methods based on pixel features, the methods based on block features and the methods based on global features. Here each group was introduced according to the used features and in turn classifiers. Segmentation accuracy and time complexities of the methods were also provided for brief comparison. In conclusion, the remaining problems and list some future research directions were detailed.

Minimum within-class variance SVM with absent features

SONG Yu-dan, WANG Shi-tong*

J4. 2010, 45(7): 102-107.

Abstract ( 1740 )

PDF (547KB) ( 1712 )

Save

Related Articles | Metrics

In the classification of data with absent features, the lately proposed support vector machine with absent features (AFSVM) has some drawbacks: the obtained classification hyper plane with AF-SVM can not well adapt to the data’s overall distribution, and the proportion of the misclassified data differs greatly between the two classes. To overcome these drawbacks, a minimum within-class variance SVM with absent features (AF-V-SVM) was proposed based on the technology of minimum class variance SVM (MCVSVM).On the one hand, AF-V-SVM could improve the direction of the classification hyper plane with the information of the distribution feature of the data set; on the other hand, this method adjusted the proportion of misclassified data by freely setting the definition space of classification margin. Experiments showed that the method in this paper was superior to other absent features based classification methods in the aspects of classification accuracy and rationality.

A new Multi-instance learning method for scene classification

WANG Gang, XU Xin-shun*

J4. 2010, 45(7): 108-113.

Abstract ( 1322 )

PDF (1227KB) ( 1253 )

Save

Related Articles | Metrics

Multi-Instance learning (MIL) is a learning framework proposed recently, and has been successfully used in scene classification. First,A new image MultiInstance (MI) bag generating method was proposed, which models an image with a Gaussian Mixed Model (GMM). The generated GMM was treated as an MI bag and the components of the GMM were the instances of the corresponding bag. Then, the information bottleneck clustering was employed to transform the MIL problem into single-instance learning problem so that single-instance classifiers could be used for classification. Ensemble learning was involved to further enhance classifiers’ generalization ability. Experimental results showed that the proposed method was superior to some common MI algorithms on average in a 5-class scene classification task.

A decision tree construction approach: two-step forward is better than one

ZHANG Wen, ZHANG Hua-xiang*, LI Ming-fang, JI Hua

J4. 2010, 45(7): 114-118.

Abstract ( 1491 )

PDF (415KB) ( 1360 )

Save

Related Articles | Metrics

In order to increase the probability of finding the global optimum, a novel decision tree construction algorithm adopting two-step forward idea was proposed based on C4.5 algorithm. The algorithm was more possible to get the global optimum of a classification task because it considered the information gained from selecting two attributes simultaneously, rather than the information gained from just selecting an optimal single attribute. Experimental results on 10 UCI benchmark data sets showed that it outperforms C4.5.

A kind of feature selection based on classification accuracy of SVM

YI Chao-qun, LI Jian-ping, ZHU Cheng-wen

J4. 2010, 45(7): 119-121.

Abstract ( 1545 )

PDF (320KB) ( 1530 )

Save

Related Articles | Metrics

The sequential forward selection based on classification accuracy (CA-SFS) was proposed by associating sequential forward selection (SFS) with generalized sequential forward selection (GSFS). It varied the value of r in GSFS and employs SVM (support vector machine)as the classifier. The classification accuracy was taken as a criterion to decide the retention or elimination of features. Simulations showed that CA-SFS performed well both in selecting fewer features and classifying samples.

A robot local path plan based on artificial immune network

HU Xuan-zi1, XIE Cun-xi2

J4. 2010, 45(7): 122-126.

Abstract ( 1342 )

PDF (728KB) ( 1794 )

Save

Related Articles | Metrics

To solve the problem of mobile robot path plan in the complex environment, artificial immune network(AIN) based mobile robot global path planning algorithm was presented. Relation between artificial immune network and robot local path plan was built, and the mobile robot path plan algorithm based on artificial immune system was proposed. The proposed algorithm was validated by simulation and compared with artificial potential field. Experimental results showed that the proposed approach was feasible and effective in the complex obstacle environment.

Table of Content