宏基因组重叠群分箱方法研究综述
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(61373057)


Methods for binning metagenomic contigs
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    宏基因组学技术可以直接从环境中提取微生物的全部遗传物质,而不需要像传统方法一样在培养基上纯培养。这种技术的出现为科学家对微生物群落的结构和功能的认识提供了重要的方法,同时对疾病的诊治、环境的治理以及生命的认识具有重大的意义。从环境中提取出微生物全部遗传物质,对其进行测序从而得到它们的reads片段,通过reads组装工具可以进一步组装成重叠群片段。对重叠群片段进行分箱,可以从宏基因组样本中重建出更多完整的基因。分箱效果的好坏直接影响到后续的生物分析,因此如何将这些含有不同微生物基因混合的重叠群序列进行有效的分箱成为了宏基因组学研究的热点和难点。机器学习方法被广泛应用于宏基因组重叠群分箱,通常分为有监督重叠群分类方法和无监督重叠群聚类方法。该综述针对宏基因组重叠群分箱方法进行了较为全面的阐述,深入剖析了重叠群分类方法与聚类方法,发现其存在分类准确率较低、分箱时间较长、难以从复杂数据集中重建更多微生物基因等问题,并对未来重叠群分箱方法的研究和发展进行了展望。作者建议可以使用半监督学习、集成学习以及深度学习方法,并采用更有效的数据特征表示等途径来提高分箱效果。

    Abstract:

    Metagenomics technology can directly extract all the microbial genetic material from environmental samples, without pure culture on the medium like traditional methods, which allows for in-depth understanding of the structures and functions of microbial communities. Moreover, it is of great significance to the diagnosis and treatment of diseases, management of the environment and understanding of life. All the genetic material of microorganism extracted from the environment is sequenced to obtain their reads which can be further assembled into contigs through the read assembly tools. Through binning of the contigs, more complete genes can be reconstructed from metagenomic samples. The effect of binning directly affects the subsequent biological analysis. Therefore, how to effectively bin these contigs containing different microbial genes has become a research hotspot and challenge in metagenomics. Machine learning methods are widely used in the binning of metagenomic contigs, which are generally classified into unsupervised contig clustering methods and supervised contig classification methods. This review introduced the methods for binning metagenomic contigs and analyzed the problems in binning methods such as low classification accuracy, high time cost, and difficulty in reconstructing more microbial genes from complex metagenomic datasets. Moreover, we summarized the future research on and development of the binning methods for metagenomic contigs. The authors suggested that semi-supervised learning, ensemble learning and deep learning methods should be used and combined with more effective data feature representation to improve the binning effect.

    参考文献
    相似文献
    引证文献
引用本文

姜忠俊,李小波. 宏基因组重叠群分箱方法研究综述. 微生物学报, 2022, 62(8): 2954-2968

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-12-17
  • 最后修改日期:2022-03-29
  • 录用日期:
  • 在线发布日期: 2022-08-16
  • 出版日期:
文章二维码