基于生物信息学的蛋白质功能预测方法研究进展
作者:
基金项目:

国家自然科学基金(31600148);山东省自然科学基金(ZR2021MC018)


Advances in bioinformatics-based protein function prediction
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [43]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    随着计算能力的增加和生物数据的快速扩展,利用生物信息学解决一些生物学问题逐渐成为主流的解决方案。蛋白质功能预测是生物医学和药物研究领域的重要任务。利用生物信息学进行蛋白质功能预测成为研究热点。本文将基于生物信息学的蛋白质功能预测方法归纳为3类:基于蛋白质序列的方法、基于蛋白质结构的方法和基于蛋白质相互作用网络的方法,并进一步分析和总结了这些方法的具体算法以及最新研究进展,为生物医学和药物研究领域深入探索预测蛋白质功能提供重要参考。

    Abstract:

    With the increasing of computer power and rapid expansion of biological data, the application of bioinformatics tools has become the mainstream approach to address biological problems. The accurate identification of protein function by bioinformatics tools is crucial for both biomedical research and drug discovery, making it a hot topic of research. In this paper, we categorize bioinformatics-based protein function prediction methods into three categories: protein sequence-based methods, protein structure-based methods, and protein interaction networks-based methods. We further analyze these specific algorithms, highlighting the latest research advancements and providing valuable references for the application of bioinformatics-based protein function prediction in biomedical research and drug discovery.

    参考文献
    [1] BOADU F, CAO H, CHENG J. Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function[J]. Bioinformatics, 2023, 39(39 supplment 1): i318-i325.
    [2] YUAN QM, CHEN S, RAO JH, ZHENG SJ, ZHAO HY, YANG YD. AlphaFold2-aware protein-DNA binding site prediction using graph transformer[J]. Briefings in Bioinformatics, 2022, 23(2): bbab564.
    [3] XIA Y, XIA CQ, PAN XY, SHEN HB. GraphBind: protein structural context embedded rules learned by hierarchical graph neural networks for recognizing nucleic-acid-binding residues[J]. Nucleic Acids Research, 2021, 49(9): e51.
    [4] YUAN QM, CHEN JW, ZHAO HY, ZHOU YQ, YANG YD. Structure-aware protein-protein interaction site prediction using deep graph convolutional network[J]. Bioinformatics, 2021, 38(1): 125-132.
    [5] 关雯倩. 人血清转铁蛋白糖基化研究进展[J]. 检验医学, 2019, 34(6): 563-566. GUAN WQ. Research progress of human serum transferrin glycosylation[J]. Laboratory Medicine, 2019, 34(6): 563-566(in Chinese).
    [6] ROST B, LIU J, NAIR R, WRZESZCZYNSKI KO, OFRAN Y. Automatic prediction of protein function[J]. Cellular and Molecular Life Sciences CMLS, 2003, 60(12): 2637-2650.
    [7] ASHBURNER M, BALL CA, BLAKE JA, BOTSTEIN D, BUTLER H, CHERRY JM, DAVIS AP, DOLINSKI K, DWIGHT SS, EPPIG JT, HARRIS MA, HILL DP, ISSEL-TARVER L, KASARSKIS A, LEWIS S, MATESE JC, RICHARDSON JE, RINGWALD M, RUBIN GM, SHERLOCK G. Gene ontology: tool for the unification of biology[J]. Nature Genetics, 2000, 25(1): 25-29.
    [8] TETKOIV, RODCHENKOV IV, WALTER MC, RATTEI T, MEWES HW. Beyond the ‘best’ match: machine learning annotation of protein sequences by integration of different sources of information[J]. Bioinformatics, 2008, 24(5): 621-628.
    [9] 滕志霞, 郭茂祖. 蛋白质功能预测方法研究进展[J]. 智能计算机与应用, 2016, 6(4): 1-4, 8. TENG ZX, GUO MZ. A survey on computational methods of predicting protein functions[J]. Intelligent Computer and Applications, 2016, 6(4): 1-4, 8(in Chinese).
    [10] TIWARI AK, SRIVASTAVA R. A survey of computational intelligence techniques in protein function prediction[J]. International Journal of Proteomics, 2014, 2014: 845479.
    [11] ZHOU NH, JIANG YX, BERGQUIST TR, LEE AJ, KACSOH BZ, CROCKER AW, LEWIS KA, GEORGHIOU G, NGUYEN HN, HAMID MN, DAVIS L, DOGAN T, ATALAY V, RIFAIOGLU AS, DALKıRAN A, CETIN ATALAY R, ZHANG CX, HURTO RL, FREDDOLINO PL, ZHANG Y, BHAT P, et al. The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens[J]. Genome Biology, 2019, 20(1): 244.
    [12] LIPMAN DJ, PEARSON WR. Rapid and sensitive protein similarity searches[J]. Science, 1985, 227(4693): 1435-1441.
    [13] ALTSCHUL SF, GISH W, MILLER W, MYERS EW, LIPMAN DJ. Basic local alignment search tool[J]. Journal of Molecular Biology, 1990, 215(3): 403-410.
    [14] ALTSCHUL SF, MADDEN TL, SCHÄFFER AA, ZHANG JH, ZHANG Z, MILLER W, LIPMAN DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs[J]. Nucleic Acids Research, 1997, 25(17): 3389-3402.
    [15] HERNÁNDEZ-PLAZA A, SZKLARCZYK D, BOTAS J, CANTALAPIEDRA CP, GINER-LAMIA J, MENDE DR, KIRSCH R, RATTEI T, LETUNIC I, JENSEN LJ, BORK P, von MERING C, HUERTA-CEPAS J. eggNOG 6.0: enabling comparative genomics across 12535 organisms[J]. Nucleic Acids Research, 2023, 51(D1): D389-D394.
    [16] RANJAN A, FAHAD MS, FERNANDEZ-BACA D, DEEPAK A, TRIPATHI S. Deep robust framework for protein function prediction using variable-length protein sequences[J]. ACM Transactions on Computational Biology and Bioinformatics, 2019: 1.
    [17] DEVOS D, VALENCIA A. Practical limits of function prediction[J]. Proteins: Structure, Function, and Genetics, 2000, 41(1): 98-107.
    [18] DEVOS D, VALENCIA A. Intrinsic errors in genome annotation[J]. Trends in Genetics, 2001, 17(8): 429-431.
    [19] KULMANOV M, HOEHNDORF R. DeepGOPlus: improved protein function prediction from sequence[J]. Bioinformatics, 2020, 36(2): 422-429.
    [20] PATHAK A, ROY T, EDUBILLI A, JAYARAM B. Mask blast with a new chemical logic of amino acids for improved protein function prediction[J]. Proteins: Structure, Function, and Bioinformatics, 2021, 89(8): 922-924.
    [21] KULMANOV M, KHAN MA, HOEHNDORF R. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier[J]. Bioinformatics, 2018, 34(4): 660-668.
    [22] RADIVOJAC P, CLARK WT, ORON TR, SCHNOES AM, WITTKOP T, SOKOLOV A, GRAIM K, FUNK C, VERSPOOR K, BEN-HUR A, PANDEY G, YUNES JM, TALWALKAR AS, REPO S, SOUZA ML, PIOVESAN D, CASADIO R, WANG Z, CHENG JL, FANG H, et al. A large-scale evaluation of computational protein function prediction[J]. Nature Methods, 2013, 10(3): 221-227.
    [23] YANG MG, CHEN SK, HUANG ZP, GAO S, YU TX, DU TT, ZHANG H, LI X, LIU CM, CHEN SH, LI HH. Deep learning-enabled discovery and characterization of HKT genes in Spartina alterniflora[J]. The Plant Journal: for Cell and Molecular Biology, 2023, 116(3): 690-705.
    [24] JAYARAM B. Decoding the design principles of amino acids and the chemical logic of protein sequences[J]. Nature Precedings, 2008, 3: 1-1.
    [25] KAUSHIK R, SINGH A, JAYARAM B. Whereinformatics lags chemistry leads[J]. Biochemistry, 2018, 57(5): 503-506.
    [26] YOU RH, ZHANG ZH, XIONG Y, SUN FZ, MAMITSUKA H, ZHU SF. GOLabeler: improving sequence-based large-scale protein funct
    [45] SZKLARCZYK D, FRANCESCHINI A, WYDER S, FORSLUND K, HELLER D, HUERTA-CEPAS J, SIMONOVIC M, ROTH A, SANTOS A, TSAFOU KP, KUHN M, BORK P, JENSEN LJ, von MERING C. STRING v10: protein-protein interaction networks, integrated over the tree of life[J]. Nucleic Acids Research, 2015, 43(D1): D447-D452.
    [46] PEROZZI B, AL-RFOU R, SKIENA S. DeepWalk: online learning of social representations[C]// Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. New York, USA. 2014: 701-710.
    [47] YOU RH, YAO SW, XIONG Y, HUANG XD, SUN FZ, MAMITSUKA H, ZHU SF. NetGO: improving large-scale protein function prediction with massive network information[J]. Nucleic Acids Research, 2019, 47(W1): W379-W387.
    [48] JI MZ, FAN XY, CORNELL CR, ZHANG Y, YUAN MM, TIAN Z, SUN KL, GAO RF, LIU Y, ZHOU JZ. Tundra soil viruses mediate responses of microbial communities to climate warming[J]. mBio, 2023, 14(2): e0300922.
    [49] JUMPER J, EVANS R, PRITZEL A, GREEN T, FIGURNOV M, RONNEBERGER O, TUNYASUVUNAKOOL K, BATES R, ŽÍDEK A, POTAPENKO A, BRIDGLAND A, MEYER C, KOHL SAA, BALLARD AJ, COWIE A, ROMERA-PAREDES B, NIKOLOV S, JAIN R, ADLER J, BACK T, et al. Highly accurate protein structure prediction with AlphaFold[J]. Nature, 2021, 596(7873): 583-589.atics, 2022, 26(10): 4957-4965.
    [33] LAI BQ, XU JB. Accurate protein function prediction via graph attention networks with predicted structure information[J]. Briefings in Bioinformatics, 2022, 23(1): bbab502.
    [34] RANJAN A, TIWARI A, DEEPAK A. A sub-sequence based approach to protein function prediction via multi-attention based multi-aspect network[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2023, 20(1): 94-105.
    [35] JIANG YX, ORON TR, CLARK WT, BANKAPUR AR, D’ANDREA D, LEPORE R, FUNK CS, KAHANDA I, VERSPOOR KM, BEN-HUR A, KOO DACE, PENFOLD-BROWN D, SHASHA D, YOUNGS N, BONNEAU R, LIN A, SAHRAEIAN SM, MARTELLI PL, PROFITI G, CASADIO R, et al. An expanded evaluation of protein function prediction methods shows an improvement in accuracy[J]. Genome Biology, 2016, 17(1): 184.
    [36] RIVES A, MEIER J, SERCU T, GOYAL S, LIN ZM, LIU J, GUO DM, OTT M, LAWRENCE ZITNICK C, MA J, FERGUS R. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences[J]. Proceedings of the National Academy of Sciences of the United States of America, 2021, 118(15): e2016239118.
    [37] XU JB, McPARTLON M, LI J. Improved protein structure prediction by deep learning irrespective of co-evolution information[J]. Nature Machine Intelligence, 2021, 3(7): 601-609.
    [38] GIRI SJ, DUTTA P, HALANI P, SAHA S. MultiPredGO: deep multi-modal protein function prediction by amalgamating protein structure, sequence, and interaction information[J]. IEEE Journal of Biomedical and Health Informatics, 2021, 25(5): 1832-1838.
    [39] KONDOHX, IIZUKA H, MASUMOTO G, KABAYA Y, KANEMATSU Y, TAKANO Y. Prediction of protein function from tertiary structure of the active site in heme proteins by convolutional neural network[J]. Biomolecules, 2023, 13(1): 137.
    [40] PIOVESAN D, GIOLLO M, LEONARDI E, FERRARI C, TOSATTO SCE. INGA: protein function prediction combining interaction networks, domain assignments and sequence similarity[J]. Nucleic Acids Research, 2015, 43(W1): W134-W140.
    [41] ZHANG FH, SONG H, ZENG M, LI YH, KURGAN L, LI M. DeepFunc: a deep learning framework for accurate prediction of protein functions from protein sequences and interactions[J]. Proteomics, 2019, 19(12): e1900019.
    [42] FAN KJ, GUAN YF, ZHANG Y. Graph2GO: a multi-modal attributed network embedding method for inferring protein functions[J]. GigaScience, 2020, 9(8): giaa081.
    [43] CAI YD, WANG JC, DENG L. SDN2GO: an integrated deep learning model for protein function prediction[J]. Frontiers in Bioengineering and Biotechnology, 2020, 8: 391.
    [44] YOU RH, YAO SW, MAMITSUKA H, ZHU SF. DeepGraphGO: graph neural network for large-scale, multispecies protein function prediction[J]. Bioinformatics, 2021, 37(supplement_1): i262-i271.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

何新媛,刘杨,曾祥荷,高荣凤,田真,樊祥宇. 基于生物信息学的蛋白质功能预测方法研究进展[J]. 生物工程学报, 2024, 40(7): 2087-2099

复制
分享
文章指标
  • 点击次数:644
  • 下载次数: 1024
  • HTML阅读次数: 1111
  • 引用次数: 0
历史
  • 收稿日期:2023-09-30
  • 在线发布日期: 2024-07-08
  • 出版日期: 2024-07-25
文章二维码
您是第6770940位访问者
生物工程学报 ® 2025 版权所有

通信地址:中国科学院微生物研究所    邮编:100101

电话:010-64807509   E-mail:cjb@im.ac.cn

技术支持:北京勤云科技发展有限公司