Marker gene broken caused overestimation on the contamination of metagenome-assembled genomes and its correction
Author:
  • Article
  • | |
  • Metrics
  • |
  • Reference [18]
  • |
  • Related
  • |
  • Cited by
  • | |
  • Comments
    Abstract:

    [Objective] Identifying and correcting the overestimation on contamination of metagenome-assembly genomes (MAGs) caused by the broken marker genes.[Methods] The impact of broken genes on quality assessment of genome was first analyzed using the simulated genomes from randomly fragmented the complete genome of isolates. We designed a corrected pipeline that identifying the broken genes pairs from the same "source" gene according to the taxonomic annotation against the nr database. Then the genome contamination was corrected by removing the redundant marker genes.[Results] The phenomenon that the genome contamination is positively correlated with the genome fragmentation degree was observed in both simulated genomes and MAGs obtained by genome binning. We designed a corrected pipeline based on the idea of identifying broken genes from the same "source" gene and the results based on the simulated genomes showed the contamination can be adjusted to complete genome level. Testing on 760 MAGs with contamination from gut and soil samples, we observed a reduction in contamination for nearly half of the MAGs, with 43 of them dropping to 0.[Conclusion] Our pipeline can correct the overestimated contamination of genome caused by broken genes to some extent and improve the availability of MAGs. The pipeline is expected to apply to the genome quality assessment of the increasing number of MAGs.

    Reference
    [1] Bowers RM, Kyrpides NC, Stepanauskas R, Harmon-Smith M, Doud D, Reddy TBK, Schulz F, Jarett J, Rivers AR, Eloe-Fadrosh EA, Tringe SG, Ivanova NN, Copeland A, Clum A, Becraft ED, Malmstrom RR, Birren B, Podar M, Bork P, Weinstock GM, Garrity GM, Dodsworth JA, Yooseph S, Sutton G, Glöckner FO, Gilbert JA, Nelson WC, Hallam SJ, Jungbluth SP, Ettema TJG, Tighe S, Konstantinidis KT, Liu WT, Baker BJ, Rattei T, Eisen JA, Hedlund B, McMahon KD, Fierer N, Knight R, Finn R, Cochrane G, Karsch-Mizrachi I, Tyson GW, Rinke C, Lapidus A, Meyer F, Yilmaz P, Parks DH, Murat Eren A, Schriml L, Banfield JF, Hugenholtz P, Woyke T. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and Archaea. Nature Biotechnology, 2017, 35(8):725-731.
    [2] Liu YX, Qin Y, Guo XX, Bai Y. Methods and applications for microbiome data analysis. Hereditas, 2019, 41(9):845-862. (in Chinese) 刘永鑫, 秦媛, 郭晓璇, 白洋. 微生物组数据分析方法与应用. 遗传, 2019, 41(9):845-862.
    [3] Xu YK, Ma Y, Hu XQ, Wang J. Analysis of prospective microbiology research using third-generation sequencing technology. Biodiversity Science, 2019, 27(5):534-542. (in Chinese) 许亚昆, 马越, 胡小茜, 王军. 基于三代测序技术的微生物组学研究进展. 生物多样性, 2019, 27(5):534-542.
    [4] Quince C, Walker AW, Simpson JT, Loman NJ, Segata N. Shotgun metagenomics, from sampling to analysis. Nature Biotechnology, 2017, 35(9):833-844.
    [5] Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO:assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics:Oxford, England, 2015, 31(19):3210-3212.
    [6] Zou YQ, Xue WB, Luo GW, Deng ZQ, Qin PP, Guo RJ, Sun HP, Xia Y, Liang SS, Dai Y, Wan DW, Jiang RR, Su LL, Feng Q, Jie ZY, Guo TK, Xia ZK, Liu C, Yu JH, Lin YX, Tang SM, Huo GC, Xu X, Hou Y, Liu X, Wang J, Yang HM, Kristiansen K, Li JH, Jia HJ, Xiao L. 1, 520 reference genomes from cultivated human gut bacteria enable functional microbiome analyses. Nature Biotechnology, 2019, 37(2):179-185.
    [7] Pasolli E, Asnicar F, Manara S, Zolfo M, Karcher N, Armanini F, Beghini F, Manghi P, Tett A, Ghensi P, Collado MC, Rice BL, DuLong C, Morgan XC, Golden CD, Quince C, Huttenhower C, Segata N. Extensive unexplored human microbiome diversity revealed by over 150, 000 genomes from metagenomes spanning age, geography, and lifestyle. Cell, 2019, 176(3):649-662.e20.
    [8] Manara S, Asnicar F, Beghini F, Bazzani D, Cumbo F, Zolfo M, Nigro E, Karcher N, Manghi P, Metzger MI, Pasolli E, Segata N. Microbial genomes from non-human primate gut metagenomes expand the primate-associated bacterial tree of life with over 1000 novel species. Genome Biology, 2019, 20(1):299.
    [9] Nayfach S, Roux S, Seshadri R, Udwary D, Varghese N, Schulz F, Wu D, Paez-Espino D, Chen I-M, Huntemann M. A genomic catalog of Earth's microbiomes. Nature Biotechnology, 2020:1-11.
    [10] Xie JP, Han YB, Liu G, Bai LQ. Research advances on microbial genetics in China in 2015. Hereditas, 2016, 38(9):765-790. (in Chinese) 谢建平, 韩玉波, 刘钢, 白林泉. 2015年中国微生物遗传学研究领域若干重要进展. 遗传, 2016, 38(9):765-790.
    [11] Ma JC, Zhao FQ, Su XQ, Xu J, Wu LH. Strategies on establishment of China's microbiome data center. Bulletin of Chinese Academy of Sciences, 2017, 32(3):290-296. (in Chinese) 马俊才, 赵方庆, 苏晓泉, 徐健, 吴林寰. 关于中国微生物组数据中心建设的思考. 中国科学院院刊, 2017, 32(3):290-296.
    [12] Parrello B, Butler R, Chlenski P, Olson R, Overbeek J, Pusch GD, Vonstein V, Overbeek R. A machine learning-based service for estimating quality of genomes using PATRIC. BMC Bioinformatics, 2019, 20(1):486.
    [13] Salzberg SL, Phillippy AM, Zimin A, Puiu D, Magoc T, Koren S, Treangen TJ, Schatz MC, Delcher AL, Roberts M, Marçais G, Pop M, Yorke JA. GAGE:a critical evaluation of genome assemblies and assembly algorithms. Genome Research, 2012, 22(3):557-567.
    [14] Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST:quality assessment tool for genome assemblies. Bioinformatics, 2013, 29(8):1072-1075.
    [15] Albertsen M, Hugenholtz P, Skarshewski A, Nielsen KL, Tyson GW, Nielsen PH. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nature Biotechnology, 2013, 31(6):533-538.
    [16] Bjørn Nielsen H, Almeida M, Juncker AS, Rasmussen S, Li JH, Sunagawa S, Plichta DR, Gautier L, Pedersen AG, le Chatelier E, Pelletier E, Bonde I, Nielsen T, Manichanh C, Arumugam M, Batto JM, dos Santos MBQ, Blom N, Borruel N, Burgdorf KS, Boumezbeur F, Casellas F, Doré J, Dworzynski P, Guarner F, Hansen T, Hildebrand F, Kaas RS, Kennedy S, Kristiansen K, Kultima JR, Léonard P, Levenez F, Lund O, Moumen B, le Paslier D, Pons N, Pedersen O, Prifti E, Qin JJ, Raes J, Sørensen S, Tap J, Tims S, Ussery DW, Yamada T, Renault P, Sicheritz-Ponten T, Bork P, Wang J, Brunak S, Dusko Ehrlich S. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nature Biotechnology, 2014, 32(8):822-828.
    [17] Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM:assessing the quality of microbial genomes recovered from isolates, single cells, and metageno孭??崮??祩愾瑇瑥?????桒敥湳??????漯捩愾猬挠椲漰?倵??′?愨渷搩????″?愱田猵攵爮?????倱爸潝搠楃杲慥汥?灥特漠歃慊爬礠潄瑯楥捲?摳礠湔愬洠楆捩?灺牰潡杴牲慩浣浫椠湄杁?朠敒湡敥晳椠湊搬椠湂杯?慫氠材漮爠楕瑮桩浶???楡??????楳潴楲湩晢潵牴浥慤琠楳捩獮??楥????べㄠで?????????扣牡?孥㈠?崠??汮瑳獴捡桮畴氠?卡????楦猠桨?坲???楮汴污敬爠?坲???祦敥牲献??坩???楯灓洠慏湎???? ̄?愠猲椰挱?氬漠挶愨永?愺汥椲朲渰洹改渮琼?獲放慛爱挹桝?瑒潩潮汫???椬??潣畨牷湩慥汮?潥晫??漬氠敓捣畺汹慲牢??楁漬氠潉杶祡??楶??????ぁ???????????????の??扊牆?嬠??嵲??桮慧欠慁?删???敦湡瑴汴敩洠慓測?剓??剮?慂?氬愠湇杩略慳朠故?昬漠牄?摤慳瑷慯?慴湨愠汊祁猬椠獈?慤湬摵?杤爠慂灐栬椠捔獳???楩??潇甬爠湓慩汥?潥晲??潓浍瀬甠瑌慩瑵椠潗湔愬氠?慩湳摥??牊慁瀬栠楈捡慬汬?卭琠慓瑊椬猠瑋楹捲獰??楥???????????????????????扵牢?孮?ぅ嵍?圠楈捵歧桥慮浨????朠材瀬氠潗瑯???氠敔朮愠湉瑮?杩牧慨灴桳椠捩獮?景漠牴?摥愠瑰慨?慬湯慧汥祮獹椠獡??丠散睯?奩潮牧欠?卯灴牥楮湴杩敡牬????????扯牢?孡?ㄠ嵤??桫攠湭??塴???渠愼湩琾桎慡牴慵浲慥渼????匠栲愰椱戳攬爠?????爴攵渹??????愴渳昷椮攼汢摲 ̄?????捐捡畲牫慳琠敄?愬渠摒?据潫浥瀠汃攬琠敃?杵敶湯潣浨敩獮?映牍漬洠?浨敡瑵慭来敩湬漠浐敁猬???楯??敲湯潦浴攠?削攬猠故慶牡据桳??楎????で?の???ぴ?????????????戮爠?孥??嵶?卲潹渠杯?圠婮??呲桬潹洠愸猬?吰?‰?業湥湴楡湧来彮牯敭晥椭湡敳牳?業浢灬牥潤瘠楧湥杮?杭敥湳漠浳敵?扳楴湡獮?瑩桡牬潬畹朠桥?瑰桡敮?捳漠浴扨楥渠慴瑲楥潥渠?潦映?摩楦晥昮攠爼敩渾瑎?扴極湲湥椠湍杩?灲牯潢杩牯慬浯獧???楩??椠漲椰渱昷漬爠洲愨琱椱挩猺??椳???特?????????㈱?????????????扒牯?孡??嵃?卲極敬扬敡爠??????倠牒潅戬猠瑓?????卭桬愻牬牫慯牰????吠桙潯浵慮獧??????敄献猠????呍牁楳湅杄攺?卶????慴湩普楧攠汴摨?????剬敩捴潹瘠敯牦礠?潥晴?杧敥湮潯浭敩獣?晡牳潳浥?浢敬瑩慥杳攮渠漼浩放獂?癯楩慮?慯?摭敡牴敩灣汳椼振慩琾椬漠渲?′愰本朠爳收木愱琰椩漺渳‰愱渱搭″猰挱漷爮椼湢杲 ̄獛琲爲慝琠敓条祡???楐?丠慍瑩畴牣敨??楬挠牁潌戬椠潆汩潮杮礠??椮????ど??????????????????戠牯?嬠??嵫?呲敹湯湴敩獣猠敧湥?????渠摲敥牣獯敶湥?????汲楯湭朠敭湥灴敡敧汥?卯??剣椠湡歮敡?????甠湷摩扴敨爠杅??千???愼湩 ̄????慭湥朠求??????瘼愯湩漾瘬愠′丰?‰圬漠礲欱攨?吩?′?礴爮瀼楢摲放獛′丳??做慨瑡楮????倠牙潡?攠?敊?愠?捨潵洠灈畑琮愠瑂楩潯湩慮汦?灲牭潡瑴潩捣潳氠?晥潴牨?晤畳氠汦祯?愠畨瑩潧浨愭瑴敨摲?摵敧捨潰湵瑴愠浄楎湁愠瑳楥潱湵?潮晣?杮敧渠潤浡整獡???楩?呂桩敧??卡????潥畳牥湡慲汣??椯???㈠?????ㄠ?????祝???祝?㈠??扮爠?孨??嵥?佥氩猠潹湓?丬???吾爬攠愱渀會攮渠?吚????楋沏汰??????敦点旕搮愠??獮瀬椠渲漰稱愶?嘠???栩町爷礶攭?????潲爾敛渲?卝??偬潭灥?????攬琠慍杩整湣潨浥楬捬?慁獌猬攠浂扯汬祡?瑤栠牍漬甠杆桯?瑳桴敥?氠敓湃猬?潇晬?癯慲氠楇摂愬琠楔潡湲?牯敷捳敫湡琠?愬搠癌慡湷捬敥獹?楔湄?愠獆獩敮獮猠楒湄朮?慁渠摮?楷洠灧牥潮癯業湩杣?瑢桬敵?煰畲慩汮楴琠祯?漠晴?来攠湨潵浭敡獮?慧獵獴攠浭扩汣敲摯?晩牯潴浡?洠攼瑩愾李敡湴潵浲敥猼???椬??爰椱改昬椠渵朶猸?椷渷‵?椩漺椴渹昹漭爵洰愴琮椼换獲??椲????ひ?????べ????ㄠ??げ????づ??戠牊?嬠??嵹??畲挠桊昮椠湍步????塁楐攭?????畸獩潢湬??????慬獩瑮?愠湦摯?猠敧湥獮楯瑭楥瘭敲?灳牯潬瑶敥楤渠?慥汴楡杧湥浮敯湭瑩?甠獤楡湴条?????佹乳????椼?举慍瑩畣牲敯??敯瑭桥漼搯獩??椠???劳?ㄠ???ㄩ呼????????せ?6] Cock PJA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, de Hoon MJL. Biopython:freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 2009, 25(11):1422-1423.
    Related
    Cited by
Get Citation

Hao Li, Dongxu Yang, Linran Wen, Wei Zheng, Feng Guo. Marker gene broken caused overestimation on the contamination of metagenome-assembled genomes and its correction. [J]. Acta Microbiologica Sinica, 2021, 61(9): 2921-2933

Copy
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:December 12,2020
  • Revised:February 20,2021
  • Online: September 04,2021
Article QR Code