Abstract:Objective To study the phylogenetic relationship and genomic diversity of intestinal obligate commensal bacteria in different populations from various regions of Xinjiang and provide a theoretical basis for developing personalized functional probiotics for different populations.Methods A total of 136 strains of Bifidobacterium longum subsp. longum were isolated from mother-infant populations of Uygur and Kazak ethnic groups in Kashgar and Yili regions of Xinjiang. Comparative genomic analysis was conducted with data of the strains from other regions in China that were available in public databases.Results The average genome size, G+C content, and the number of coding sequences of B. longum subsp. longum were 2.38 Mb, 59.91%, and 2 160, respectively. The phylogenetic tree constructed based on core genes showed that all strains from Xinjiang belonged to four clades in the phylogenetic tree. Strains from the same ethnic group but from different geographical regions were in different clades, and there was a certain degree of overlap between geographically closer and different population-derived strains. The analysis of a larger geographical range (China) showed that B. longum subsp. longum strains and their functional genes presented obvious geographical and ethnic distribution characteristics. The analysis of COG functional genes and carbohydrate hydrolyase-related genes showed that the functional gene spectra varied greatly among strains from the same ethnic group but in different regions. The carbohydrate hydrolyase-related gene families GH13 (α-amylases) and GH43 (β-amylases) were more abundant in the strains from Kashgar region. Conversely, even strains from different ethnic groups but from geographically close regions had similar spectra of COG functional genes and carbohydrate hydrolyase-related gene families.Conclusion The B. longum subsp. longum strains and their functional genes from different geographical regions and ethnic groups in Xinjiang showed obvious geographical and ethnic distribution characteristics. As the geographical scale becomes large, the geographical distribution characteristics of the strains become more obvious. The relationship between the geographical distribution scale of populations and the co-evolution and specificity of strains should be verified based on larger-scale genomic data of strains.