Methods for binning metagenomic contigs
Author:
  • Article
  • | |
  • Metrics
  • |
  • Reference [75]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Metagenomics technology can directly extract all the microbial genetic material from environmental samples, without pure culture on the medium like traditional methods, which allows for in-depth understanding of the structures and functions of microbial communities. Moreover, it is of great significance to the diagnosis and treatment of diseases, management of the environment and understanding of life. All the genetic material of microorganism extracted from the environment is sequenced to obtain their reads which can be further assembled into contigs through the read assembly tools. Through binning of the contigs, more complete genes can be reconstructed from metagenomic samples. The effect of binning directly affects the subsequent biological analysis. Therefore, how to effectively bin these contigs containing different microbial genes has become a research hotspot and challenge in metagenomics. Machine learning methods are widely used in the binning of metagenomic contigs, which are generally classified into unsupervised contig clustering methods and supervised contig classification methods. This review introduced the methods for binning metagenomic contigs and analyzed the problems in binning methods such as low classification accuracy, high time cost, and difficulty in reconstructing more microbial genes from complex metagenomic datasets. Moreover, we summarized the future research on and development of the binning methods for metagenomic contigs. The authors suggested that semi-supervised learning, ensemble learning and deep learning methods should be used and combined with more effective data feature representation to improve the binning effect.

    Reference
    [1] Gerritsen J, Smidt H, Rijkers GT, De Vos WM. Intestinal microbiota in human health and disease:the impact of probiotics. Genes& Nutrition, 2011, 6(3):209-240.
    [2] Ma T, Xiao D, Xing X. MetaBMF:a scalable binning algorithm for large-scale reference-free metagenomic studies. Bioinformatics, 2019, 36(2):356-363.
    [3] Huang YJ, Boushey HA. The microbiome in asthma. The Journal of Allergy and Clinical Immunology, 2015, 135(1):25-30.
    [4] Huang YJ, Marsland BJ, Bunyavanich S, O'Mahony L, Leung DYM, Muraro A, Fleisher TA. The microbiome in allergic disease:current understanding and future opportunities-2017 PRACTALL document of the American Academy of Allergy, Asthma& Immunology and the European Academy of Allergy and Clinical Immunology. Journal of Allergy and Clinical Immunology, 2017, 139(4):1099-1110.
    [5] Turnbaugh PJ, Ley RE, Mahowald MA, Magrini V, Mardis ER, Gordon JI. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature, 2006, 444(7122):1027-1031.
    [6] Severance EG, Yolken RH, Eaton WW. Autoimmune diseases, gastrointestinal disorders and the microbiome in schizophrenia:more than a gut feeling. Schizophrenia Research, 2016, 176(1):23-35.
    [7] Brown CT, Davis-Richardson AG, Giongo A, Gano KA, Crabb DB, Mukherjee N, Casella G, Drew JC, Ilonen J, Knip M, Hyöty H, Veijola R, Simell T, Simell O, Neu J, Wasserfall CH, Schatz D, Atkinson MA, Triplett EW. Gut microbiome metagenomics analysis suggests a functional model for the development of autoimmunity for type 1 diabetes. PLoS One, 2011, 6(10):e25792.
    [8] Clemente JC, Ursell LK, Parfrey LW, Knight R. The impact of the gut microbiota on human health:an integrative view. Cell, 2012, 148(6):1258-1270.
    [9] Hentschel U, Piel J, Degnan SM, Taylor MW. Genomic insights into the marine sponge microbiome. Nature Reviews Microbiology, 2012, 10(9):641-654.
    [10] 丁锐,陈旭辉,李炳学.植酸酶研究进展及土壤植酸酶应用展望.生物技术通报, 2019, 35(7):190-195. Ding R, Chen XH, Li BX. Research advances on phytase and prospect of applying soil phytase. Biotechnology Bulletin, 2019, 35(7):190-195.(in Chinese)
    [11] Gardner SN, Frey KG, Redden CL, Thissen JB, Allen JE, Allred AF, Dyer MD, Mokashi VP, Slezak TR. Targeted amplification for enhanced detection of biothreat agents by next-generation sequencing. BMC Research Notes, 2015, 8:682.
    [12] Zhou J, Xue K, Xie J, Deng Y, Wu L, Cheng X, Fei S, Deng S, He Z, Van Nostrand JD, Luo Y. Microbial mediation of carbon-cycle feedbacks to climate warming. Nature Climate Change, 2012, 2(2):106-110.
    [13] Xing MN, Zhang XZ, Huang H. Application of metagenomic techniques in mining enzymes from microbial communities for biofuel synthesis. Biotechnology Advances, 2012, 30(4):920-929.
    [14] Kellenberger E. Exploring the unknown. The silent revolution of microbiology. EMBO Reports, 2001, 2(1):5-7.
    [15] Riesenfeld CS, Schloss PD, Handelsman J. Metagenomics:genomic analysis of microbial communities. Annual Review of Genetics, 2004, 38:525-552.
    [16] 梁艺馨.基于改进密度峰值的宏基因组重叠群聚类算法研究.吉林大学硕士学位论文, 2020.
    [17] Li DH, Liu CM, Luo RB, Sadakane K, Lam TW. MEGAHIT:an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics, 2015, 31(10):1674-1676.
    [18] Boisvert S, Raymond F, Godzaridis E, Laviolette F, Corbeil J. Ray meta:scalable de novo metagenome assembly and profiling. Genome Biology, 2012, 13(12):R122.
    [19] Nurk S, Meleshko D, Korobeynikov A, Pevzner PA. metaSPAdes:a new versatile metagenomic assembler. Genome Research, 2017, 27(5):824-834.
    [20] 张安琪.面向宏基因组数据的拼接算法研究.哈尔滨工业大学硕士学位论文, 2018.
    [21] Mikheenko A, Saveliev V, Gurevich A. MetaQUAST:evaluation of metagenome assemblies. Bioinformatics, 2015, 32(7):1088-1090.
    [22] Seppey M, Manni M, Zdobnov EM. BUSCO:assessing genome assembly and annotation completeness. Methods in Molecular Biology:Clifton, NJ, 2019, 1962:227-245.
    [23] Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM:assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research, 2015, 25(7):1043-1055.
    [24] Rinke C, Rubino F, Messer LF, Youssef N, Parks DH, Chuvochina M, Brown M, Jeffries T, Tyson GW, Seymour JR, Hugenholtz P. A phylogenomic and ecological analysis of the globally abundant marine group Ⅱ archaea (Ca. Poseidoniales ord. nov.). The ISME Journal, 2019, 13(3):663-675.
    [25] Liu Y, Makarova KS, Huang WC, Wolf YI, Nikolskaya AN, Zhang X, Cai M, Zhang CJ, Xu W, Luo Z, Cheng L, Koonin EV, Li M. Expanded diversity of Asgard archaea and their relationships with eukaryotes. Nature, 2021, 593(7860):553-557.
    [26] Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes SP. DADA2:high-resolution sample inference from Illumina amplicon data. Nature Methods, 2016, 13(7):581-583.
    [27] Amir A, McDonald D, Navas-Molina JA, Kopylova E, Morton JT, Xu ZZ, Kightley EP, Thompson LR, Hyde ER, Gonzalez A, Knight R. Deblur rapidly resolves single-nucleotide community sequence patterns. mSystems, 2017, 2(2):e00191-e00116.
    [28] Edgar RC. UPARSE:highly accurate OTU sequences from microbial amplicon reads. Nature Methods, 2013, 10(10):996-998.
    [29] Wu YW, Simmons BA, Singer SW. MaxBin 2.0:an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics, 2015, 32(4):605-607.
    [30] Kang DD, Froula J, Egan R, Wang Z. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ, 2015, 3:e1165.
    [31] Lin HH, Liao YC. Accurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes. Scientific Reports, 2016, 6:24175.
    [32] Breitwieser FP, Lu J, Salzberg SL. A review of methods and databases for metagenomic classification and assembly. Briefings in Bioinformatics, 2017, 20(4):1125-1136.
    [33] Mande SS, Mohammed MH, Ghosh TS. Classification of metagenomic sequences:methods and challenges. Briefings in Bioinformatics, 2012, 13(6):669-681.
    [34] Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. Journal of Molecular Biology, 1990, 215(3):403-410.
    [35] Huson DH, Auch AF, Qi J, Schuster SC. MEGAN analysis of metagenomic data. Genome Research, 2007, 17(3):377-386.
    [36] Brady A, Salzberg SL. Phymm and PhymmBL:metagenomic phylogenetic classification with interpolated Markov models. Nature Methods, 2009, 6(9):673-676.
    [37] Liu B, Gibbons T, Ghodsi M, Treangen T, Pop M. Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences. BMC Genomics, 2011, 12(S2):S4.
    [38] Freitas TAK, Li PE, Scholz MB, Chain PSG. Accurate read-based metagenome characterization using a hierarchical suite of unique signatures. Nucleic Acids Research, 2015, 43(10):e69.
    [39] Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, Phillippy AM. Mash:fast genome and metagenome distance estimation using MinHash. Genome Biology, 2016, 17(1):132.
    [40] Pierce NT, Irber L, Reiter T, Brooks P, Brown CT. Large-scale sequence comparisons with sourmash. F1000Research, 2019, 8:1006.
    [41] Davis MPA, Van Dongen S, Abreu-Goodger C, Bartonicek N, Enright AJ. Kraken:a set of tools for quality control and analysis of high-throughput sequence data. Methods, 2013, 63(1):41-49.
    [42] Ounit R, Wanamaker S, Close TJ, Lonardi S. CLARK:fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genomics, 2015, 16(1):236.
    [43] Lu J, Breitwieser FP, Thielen P, Salzberg SL. Bracken:estimating species abundance in metagenomics data. PeerJ Computer Science, 2017, 3:e104.
    [44] Kim D, Song L, Breitwieser FP, Salzberg SL. Centrifuge:rapid and sensitive classification of metagenomic sequences. Genome Research, 2016, 26(12):1721-1729.
    [45] Delcher AL, Phillippy A, Carlton J, Salzberg SL. Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Research, 2002, 30(11):2478-2483.
    [46] Gregor I, Dröge J, Schirmer M, Quince C, McHardy AC. PhyloPythiaS+:a self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes. PeerJ, 2016, 4:e1603.
    [47] Jiang Y, Wang J, Xia D, Yu G. EnSVMB:metagenomics fragments classification using ensemble SVM and BLAST. Scientific Reports, 2017, 7:9440.
    [48] 李强,衣杨,吴忠道,丁涛.基于机器学习的肠道菌群数据建模与分析研究综述.微生物学通报, 2021, 48(1):180-196. Li Q, Yi Y, Wu ZD, Ding T. Review of gut microbiome analysis prediction models and algorithms. Microbiology China, 2021, 48(1):180-196.(in Chinese)
    [49] Fiannaca A, La Paglia L, La Rosa M, Lo Bosco G, Renda G, Rizzo R, Gaglio S, Urso A. Deep learning models for bacteria taxonomic classification of metagenomic data. BMC Bioinformatics, 2018, 19(S7):198.
    [50] Teeling H, Waldmann J, Lombardot T, Bauer M, Glöckner FO. TETRA:a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences. BMC Bioinformatics, 2004, 5:163.
    [51] Ultsch A, Morchen F. ESOM-Maps:tools for clustering, visualization, and classification with Emergent SOM. Technical Report, Vol. 46. Germany:Department of Mathematics and Computer Science, University of Marburg, 2005.
    [52] Chatterji S, Yamazaki I, Bai ZJ, Eisen JA. CompostBin:a DNA composition-based algorithm for binning environmental shotgun reads. Lecture notes in computer science. Berlin, Heidelberg:Springer Berlin Heidelberg, 2008:17-28.
    [53] Wu YW, Ye YZ. A novel abundance-based algorithm for binning metagenomic sequences using l-tuples. Journal of Computational Biology:a Journal of Computational Molecular Cell Biology, 2011, 18(3):523-534.
    [54] Leung HCM, Yiu SM, Yang B, Peng Y, Wang Y, Liu ZH, Chen JC, Qin JJ, Li RQ, Chin FYL. A robust and accurate binning algorithm for metagenomic sequences with arbitrary species abundance ratio. Bioinformatics, 2011, 27(11):1489-1495.
    [55] Wang Y, Leung H, Yiu S, Chin F. MetaCluster-TA:taxonomic annotation for metagenomic data based on assembly-assisted binning. BMC Genomics, 2014, 15(Suppl 1):S12.
    [56] Peng Y, Leung HCM, Yiu SM, Chin FYL. IDBA-UD:a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics, 2012, 28(11):1420-1428.
    [57] Vicedomini R, Quince C, Darling AE, Chikhi R. Strainberry:automated strain separation in low-complexity metagenomes using long reads. Nature Communications, 2021, 12:4485.
    [58] Xing X, Liu JS, Zhong WX. MetaGen:reference-free learning with multiple metagenomic samples. Genome Biology, 2017, 18(1):187.
    [59] Alneberg J, Bjarnason BS, De Bruijn I, Schirmer M, Quick J, Ijaz UZ, Lahti L, Loman NJ, Andersson AF, Quince C. Binning metagenomic contigs by coverage and composition. Nature Methods, 2014, 11(11):1144-1146.
    [60] Lu YY, Chen T, Fuhrman JA, Sun FZ. COCACOLA:binning metagenomic contigs using sequence composition, read coverage, co-alignment and paired-end read linkage. Bioinformatics, 2016, 33(6):791-798.
    [61] Yu GX, Jiang Y, Wang J, Zhang H, Luo HW. BMC3C:binning metagenomic contigs using codon usage, sequence composition and read coverage. Bioinformatics, 2018, 34(24):4172-4179.
    [62] Wang ZY, Wang ZY, Lu YY, Sun FZ, Zhu SF. SolidBin:improving metagenome binning with semi-supervised normalized cut. Bioinformatics, 2019, 35(21):4229-4238.
    [63] Miller IJ, Rees ER, Ross J, Miller I, Baxa J, Lopera J, Kerby RL, Rey FE, Kwan JC. Autometa:automated extraction of microbial genomes from individual shotgun metagenomes. Nucleic Acids Research, 2019, 47(10):e57.
    [64] Uritskiy GV, DiRuggiero J, Taylor J. MetaWRAP-a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome, 2018, 6(1):158.
    [65] Sieber CMK, Probst AJ, Sharrar A, Thomas BC, Hess M, Tringe SG, Banfield JF. Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nature Microbiology, 2018, 3(7):836-843.
    [66] Nissen JN, Johansen J, Allesøe RL, Sønderby CK, Armenteros JJA, Grønbech CH, Jensen LJ, Nielsen HB, Petersen TN, Winther O, Rasmussen S. Improved metagenome binning and assembly using deep variational autoencoders. Nature Biotechnology, 2021, 39(5):555-560.
    [67] Sczyrba A, Hofmann P, Belmann P, Koslicki D, Janssen S, Dröge J, Gregor I, Majda S, Fiedler J, Dahms E, Bremges A, Fritz A, Garrido-Oter R, Jørgensen TS, Shapiro N, Blood PD, Gurevich A, Bai Y, Turaev D, DeMaere MZ, Chikhi R, Nagarajan N, Quince C, Meyer F, Balvočiūtė M, Hansen LH, Sørensen SJ, Chia BKH, Denis B, Froula JL, Wang Z, Egan R, Don Kang D, Cook JJ, Deltel C, Beckstette M, Lemaitre C, Peterlongo P, Rizk G, Lavenier D, Wu YW, Singer SW, Jain C, Strous M, Klingenberg H, Meinicke P, Barton MD, Lingner T, Lin HH, Liao YC, Silva GGZ, Cuevas DA, Edwards RA, Saha S, Piro VC, Renard BY, Pop M, Klenk HP, Göker M, Kyrpides NC, Woyke T, Vorholt JA, Schulze-Lefert P, Rubin EM, Darling AE, Rattei T, McHardy AC. Critical assessment of metagenome interpretation-a benchmark of metagenomics software. Nature Methods, 2017, 14(11):1063-1071.
    [68] Yue Y, Huang H, Qi Z, Dou HM, Liu XY, Han TF, Chen Y, Song XJ, Zhang YH, Tu J. Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets. BMC Bioinformatics, 2020, 21(1):334.
    [69] Jiang ZJ, Li XB, Guo LJ. MetaCRS:unsupervised clustering of contigs with the recursive strategy of reducing metagenomic dataset's complexity. BMC Bioinformatics, 2022, 22(Suppl 12):315.
    [70] Nielsen HB, Almeida M, Juncker AS, Rasmussen S, Li J, Sunagawa S, Plichta DR, Gautier L, Pedersen AG, Le Chatelier E, Pelletier E, Bonde I, Nielsen T, Manichanh C, Arumugam M, Batto JM, Quintanilha Dos Santos MB, Blom N, Borruel N, Burgdorf KS, Boumezbeur F, Casellas F, Doré J, Dworzynski P, Guarner F, Hansen T, Hildebrand F, Kaas RS, Kennedy S, Kristiansen K, Kultima JR, Léonard P, Levenez F, Lund O, Moumen B, Le Paslier D, Pons N, Pedersen O, Prifti E, Qin J, Raes J, Sørensen S, Tap J, Tims S, Ussery DW, Yamada T, Renault P, Sicheritz-Ponten T, Bork P, Wang J, Brunak S, Ehrlich SD. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nature Biotechnology, 2014, 32(8):822-828.
    [71] Namirimu T, Kim YJ, Park MJ, Lim D, Lee JH, Kwon KK. Microbial community structure and functional potential of deep-sea sediments on low activity hydrothermal area in the central Indian ridge. Frontiers in Marine Science, 2022, 9:784807.
    [72] Valadez-Cano C, Hawkes K, Calvaruso R, Reyes-Prieto A, Lawrence J. Amplicon-based and metagenomic approaches provide insights into toxigenic potential in understudied Atlantic Canadian lakes. FACETS, 2022, 7:194-214.
    [73] 陈茜,薛勇,宋晓峰,朱宝利.糖尿病及糖尿病心血管并发症患者肠道菌群的特征.微生物学报, 2019, 59(9):1660-1673. Chen X, Xue Y, Song XF, Zhu BL. Gut microbiota in diabetic patients and diabetic patients with cardiovascular complications. Acta Microbiologica Sinica, 2019, 59(9):1660-1673.(in Chinese)
    [74] 汪湾,尚潇潇,曾秋耀,刘冬冬,李贝贝,张嘉琪,杨洪艳,黄译乐,胡薇,傅锦坚,徐建华.多囊卵巢综合征患者肠道菌群和生化免疫分子特征.微生物学报, 2021, 61(2):452-468. Wang W, Shang XX, Zeng QY, Liu DD, Li BB, Zhang JQ, Yang HY, Huang YL, Hu W, Fu JJ, Xu JH. Characteristics of intestinal microflora and biochemical immune molecules in patients with polycystic ovarian syndrome. Acta Microbiologica Sinica, 2021, 61(2):452-468.(in Chinese)
    [75] Arikawa K, Ide K, Kogawa M, Saeki T, Yoda T, Endoh T, Matsuhashi A, Takeyama H, Hosokawa M. Recovery of strain-resolved genomes from human microbiome through an integration framework of single-cell genomics and metagenomics. Microbiome, 2021, 9(1):202.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

JIANG Zhongjun, LI Xiaobo. Methods for binning metagenomic contigs. [J]. Acta Microbiologica Sinica, 2022, 62(8): 2954-2968

Copy
Share
Article Metrics
  • Abstract:574
  • PDF: 1323
  • HTML: 3306
  • Cited by: 0
History
  • Received:December 17,2021
  • Revised:March 29,2022
  • Online: August 16,2022
Article QR Code