Raghunath Satpathy, Rashmiranjan Behera, Susant Ku Padhi and Rajesh Kumar Guru
Currently data mining is an essential tool to discover the hidden data and important patterns from a large data set. The present work is a pilot study that compares the result of sequence based phylogenetic study and some of the physicochemical and structural feature based clustering of Laccase enzyme sequences. Total of 50 homologous sequences were obtained specific to each of the organism like plant, fungi and bacteria. Multiple sequences alignment of sequences was performed followed by phylogenetic tree construction and consistency study also to observe the major clusters. Again the major domain and motif analysis was done to support the study in the divergence pattern of Laccase enzyme sequences. There after 13 numbers of physicochemical and structural features were computed for each enzyme sequences. Then data normalisation and k-means clustering technique revealed that the fungi, bacteria and plant were obtained in three distinct clusters. The analysis indicates that the result of sequence based classification is in a good agreement with physicochemical basis of classification of proteins. The methods can be further optimised for different clustering algorithm to obtain specific physicochemical features that would help to classification of proteins.
分享此文章