Hello,
I am working with lake water sample. My focus is on harmful cyanobacteria. I ran metagenomic sequence using HUMAnN3.
Configuration I used for my sample is :
HUMAnN Configuration ( Section : Name = Value )
database_folders : nucleotide = /home/hassan/Desktop/hdatabases/chocophlan
database_folders : protein = /home/hassan/Desktop/hdatabases/uniref50ecf
database_folders : utility_mapping = /home/hassan/Desktop/b/utility_mapping
run_modes : resume = False
run_modes : verbose = False
run_modes : bypass_prescreen = False
run_modes : bypass_nucleotide_index = False
run_modes : bypass_nucleotide_search = False
run_modes : bypass_translated_search = False
run_modes : threads = 1
alignment_settings : evalue_threshold = 1.0
alignment_settings : prescreen_threshold = 0.01
alignment_settings : translated_subject_coverage_threshold = 50.0
alignment_settings : translated_query_coverage_threshold = 90.0
alignment_settings : nucleotide_subject_coverage_threshold = 50.0
alignment_settings : nucleotide_query_coverage_threshold = 90.0
output_format : output_max_decimals = 10
output_format : remove_stratified_output = False
output_format : remove_column_description_output = False
Outputs I got:
Removing spaces from identifiers in input file …
Running metaphlan …
Found g__GGB43952.s__GGB43952_SGB61317 : 24.13% of mapped reads ( )
Found g__GGB43067.s__GGB43067_SGB57480 : 13.42% of mapped reads ( )
Found g__GGB46342.s__GGB46342_SGB64120 : 13.10% of mapped reads ( )
Found g__GGB24856.s__GGB24856_SGB81948 : 8.95% of mapped reads ( )
Found g__GGB43951.s__GGB43951_SGB61315 : 6.47% of mapped reads ( )
Found g__GGB35689.s__GGB35689_SGB85076 : 6.18% of mapped reads ( )
Found g__Polynucleobacter.s__Polynucleobacter_sp_MWH_UH24A : 4.19% of mapped reads ( )
Found g__GGB25977.s__GGB25977_SGB37971 : 3.43% of mapped reads ( )
Found g__GGB59202.s__GGB59202_SGB80969 : 3.40% of mapped reads ( )
Found t__SGB13449 : 3.15% of mapped reads ( s__Cylindrospermopsis_raciborskii,g__Raphidiopsis.s__Raphidiopsis_brookii )
Found t__SGB24761 : 2.51% of mapped reads ( s__Cuspidothrix_issatschenkoi )
Found t__SGB24471 : 2.42% of mapped reads ( s__Phenylobacterium_sp_HYN0004 )
Found g__GGB43953.s__GGB43953_SGB61318 : 2.39% of mapped reads ( )
Found g__Candidatus_Methylopumilus.s__Candidatus_Methylopumilus_rimovensis : 1.20% of mapped reads ( )
Found g__GGB57651.s__GGB57651_SGB79249 : 0.84% of mapped reads ( )
Found g__GGB34754.s__GGB34754_SGB82226 : 0.65% of mapped reads ( )
Found t__SGB28829 : 0.58% of mapped reads ( s__Pelagibacterales_bacterium )
Found g__Pseudanabaena.s__Pseudanabaena_sp_FACHB_1050 : 0.54% of mapped reads ( )
Found t__SGB24760 : 0.36% of mapped reads ( s__Anabaena_sp_CRKS33,g__Dolichospermum.s__Dolichospermum_planctonicum,g__Dolichospermum.s__Dolichospermum_flos_aquae,g__Dolichospermum.s__Dolichospermum_sp_FACHB_1091,g__Anabaena.s__Anabaena_sp_FACHB_1250,g__Anabaena.s__Anabaena_sp_FACHB_1391 )
Found t__SGB13518 : 0.28% of mapped reads ( s__Microcystis_aeruginosa,g__Microcystis.s__Microcystis_viridis,g__Microcystis.s__Microcystis_wesenbergii,g__Microcystis.s__Microcystis_sp_0824,g__Microcystis.s__Microcystis_sp_T1_4,g__Microcystis.s__Microcystis_sp_LEGE_00066,g__Microcystis.s__Microcystis_sp_MC19,g__Microcystis.s__Microcystis_sp_LEGE_08355,g__Microcystis.s__Microcystis_flos_aquae )
Found g__Alphaproteobacteria_unclassified.s__alpha_proteobacterium_SCGC_AAA028_D10 : 0.28% of mapped reads ( g__Alphaproteobacteria_unclassified.s__alpha_proteobacterium_SCGC_AAA027_C06,g__Alphaproteobacteria_unclassified.s__alpha_proteobacterium_SCGC_AAA027_L15 )
Found g__GGB73741.s__GGB73741_SGB49722 : 0.27% of mapped reads ( )
Found g__GGB32003.s__GGB32003_SGB45716 : 0.16% of mapped reads ( )
Found g__GGB43055.s__GGB43055_SGB60296 : 0.14% of mapped reads ( )
Found g__GGB24725.s__GGB24725_SGB36612 : 0.13% of mapped reads ( )
Found g__Limnohabitans.s__Limnohabitans_sp_103DPR2 : 0.11% of mapped reads ( g__Limnohabitans.s__Limnohabitans_sp_Hippo4 )
Found g__Actinomycetia_unclassified.s__actinobacterium_SCGC_AAA028_A23 : 0.10% of mapped reads ( )
Found g__Pseudanabaena.s__Pseudanabaena_yagii : 0.10% of mapped reads ( )
Found t__SGB5711 : 0.09% of mapped reads ( s__Candidatus_Nanopelagicus_limnes )
Found t__SGB13423 : 0.08% of mapped reads ( s__Planktothrix_agardhii,g__Planktothrix.s__Planktothrix_rubescens,g__Planktothrix.s__Planktothrix_prolifica )
Found g__GGB56956.s__GGB56956_SGB78416 : 0.08% of mapped reads ( )
Found g__GGB32489.s__GGB32489_SGB48813 : 0.08% of mapped reads ( )
Found g__GGB62809.s__GGB62809_SGB85028 : 0.07% of mapped reads ( )
Found t__SGB24763 : 0.06% of mapped reads ( s__Sphaerospermopsis_kisseleviana,g__Sphaerospermopsis.s__Sphaerospermopsis_kisseleviana,g__Sphaerospermopsis.s__Sphaerospermopsis_sp_FACHB_1194,g__Sphaerospermopsis.s__Sphaerospermopsis_sp_LEGE_08334,g__Sphaerospermopsis.s__Sphaerospermopsis_sp_FACHB_1094,g__Sphaerospermopsis.s__Sphaerospermopsis_reniformis,g__Sphaerospermopsis.s__Sphaerospermopsis_sp_LEGE_00249 )
Found g__GGB46492.s__GGB46492_SGB64353 : 0.02% of mapped reads ( )
Found g__GGB43954.s__GGB43954_SGB61319 : 0.02% of mapped reads ( )
Found g__GGB44382.s__GGB44382_SGB61797 : 0.01% of mapped reads ( )
Total species selected from prescreen: 71
Selected species explain 99.99% of predicted community composition
Creating custom ChocoPhlAn database …
Running bowtie2-build …
Running bowtie2 …
Total bugs from nucleotide alignment: 13
g__Cuspidothrix.s__Cuspidothrix_issatschenkoi: 46989 hits
g__Anabaena.s__Anabaena_sp_CRKS33: 10985 hits
g__Pelagibacterales_unclassified.s__Pelagibacterales_bacterium: 6402 hits
g__Cylindrospermopsis.s__Cylindrospermopsis_raciborskii: 30055 hits
g__Phenylobacterium.s__Phenylobacterium_sp_HYN0004: 28182 hits
g__Microcystis.s__Microcystis_aeruginosa: 12437 hits
g__Raphidiopsis.s__Raphidiopsis_brookii: 47806 hits
g__Sphaerospermopsis.s__Sphaerospermopsis_kisseleviana: 3140 hits
g__Candidatus_Nanopelagicus.s__Candidatus_Nanopelagicus_limnes: 7823 hits
g__Planktothrix.s__Planktothrix_agardhii: 1498 hits
g__Microcystis.s__Microcystis_flos_aquae: 1067 hits
g__Microcystis.s__Microcystis_wesenbergii: 661 hits
g__Planktothrix.s__Planktothrix_rubescens: 835 hits
Total gene families from nucleotide alignment: 7904
Unaligned reads after nucleotide alignment: 99.2391962932 %
Running diamond …
Aligning to reference database: uniref50_201901b_ec_filtered.dmnd
Total bugs after translated alignment: 14
g__Cuspidothrix.s__Cuspidothrix_issatschenkoi: 46989 hits
g__Anabaena.s__Anabaena_sp_CRKS33: 10985 hits
g__Pelagibacterales_unclassified.s__Pelagibacterales_bacterium: 6402 hits
g__Cylindrospermopsis.s__Cylindrospermopsis_raciborskii: 30055 hits
g__Phenylobacterium.s__Phenylobacterium_sp_HYN0004: 28182 hits
g__Microcystis.s__Microcystis_aeruginosa: 12437 hits
g__Raphidiopsis.s__Raphidiopsis_brookii: 47806 hits
g__Sphaerospermopsis.s__Sphaerospermopsis_kisseleviana: 3140 hits
g__Candidatus_Nanopelagicus.s__Candidatus_Nanopelagicus_limnes: 7823 hits
g__Planktothrix.s__Planktothrix_agardhii: 1498 hits
g__Microcystis.s__Microcystis_flos_aquae: 1067 hits
g__Microcystis.s__Microcystis_wesenbergii: 661 hits
g__Planktothrix.s__Planktothrix_rubescens: 835 hits
unclassified: 1965349 hits
Total gene families after translated alignment: 55375
Unaligned reads after translated alignment: 92.6032063024 %
After running metaphlan 71 spices were identified but most of them (58) are unclassified species. 13 species were classified in metaphlan which is consistent with bowtie2 and diamond (uniref50ecfiltered). These species are mostly related to algal bloom which is also my interest as well. But my questions are:
-
Is there any way to increase the number of identified species, decease % of unaligned reads of nucleotide and translated alignment (which is currently approx. 99% and 92% respectively)?
-
What could be the reason for finding only algal bloom related species in my sample although a major portion is unaligned ?
-
what is the difference of ec-filtered database and not ec-filtered database?
-
If there is no way to increase % aligned reads do you think this % is usual for these kind of water sample?
( I have also attached the log file of my run for better understanding)
run0043_lane9_read2_indexN726-S518=ENN-8-17-16.txt|attachment (113.8 KB)
Thanks,
Hassan