组装断裂导致宏基因组来源的基因组污染度高估的评估与修正
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(31670492,31500100)


Marker gene broken caused overestimation on the contamination of metagenome-assembled genomes and its correction
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    [目的] 识别并修正由断裂的标记基因引起的来自宏基因组测序组装的基因组污染度的高估。[方法] 利用纯菌完整基因组构造的模拟数据来分析断裂基因对基因组质量评估的影响以及设定矫正参数,基于nr库的分类学注释结果来判定2个断裂标记基因(即断裂基因对)是否来自于同一标记基因,在剔除断裂冗余基因后重新计算污染度。[结果] 基于纯菌完整基因组模拟打断数据的结果表明基因组片段化程度越高,基因组的污染度越高,并且该现象在分箱获得的微生物基因组草图中也有体现。我们设计的矫正流程能将纯菌模拟打断数据的污染度纠正到完整基因组的水平。在对760个肠道和土壤宏基因组来源的污染度大于0的基因组草图进行矫正后,接近半数基因组的污染度降低,其中43个基因组的污染度降至0。[结论] 我们的流程可以在一定程度上矫正由断裂基因引起的基因组污染度的高估,提高分箱基因组草图的可利用率,并可应用于需求日益增加的宏基因组来源的基因组质量评估中。

    Abstract:

    [Objective] Identifying and correcting the overestimation on contamination of metagenome-assembly genomes (MAGs) caused by the broken marker genes.[Methods] The impact of broken genes on quality assessment of genome was first analyzed using the simulated genomes from randomly fragmented the complete genome of isolates. We designed a corrected pipeline that identifying the broken genes pairs from the same "source" gene according to the taxonomic annotation against the nr database. Then the genome contamination was corrected by removing the redundant marker genes.[Results] The phenomenon that the genome contamination is positively correlated with the genome fragmentation degree was observed in both simulated genomes and MAGs obtained by genome binning. We designed a corrected pipeline based on the idea of identifying broken genes from the same "source" gene and the results based on the simulated genomes showed the contamination can be adjusted to complete genome level. Testing on 760 MAGs with contamination from gut and soil samples, we observed a reduction in contamination for nearly half of the MAGs, with 43 of them dropping to 0.[Conclusion] Our pipeline can correct the overestimated contamination of genome caused by broken genes to some extent and improve the availability of MAGs. The pipeline is expected to apply to the genome quality assessment of the increasing number of MAGs.

    参考文献
    相似文献
    引证文献
引用本文

李浩,杨东旭,温林冉,郑伟,郭峰. 组装断裂导致宏基因组来源的基因组污染度高估的评估与修正. 微生物学报, 2021, 61(9): 2921-2933

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2020-12-12
  • 最后修改日期:2021-02-20
  • 录用日期:
  • 在线发布日期: 2021-09-04
  • 出版日期:
文章二维码