Hi,
The Maaslin2 function changes several characters (e.g. semicolon, hyphen, square bracket and parenthesis) in the taxonomic names from Silva-db to a period (i.e. a .). How can this be avoided?
I get problems when I want to compare results from other analysis tools that keep the original names.
Likewise, an X is added to feature names that start with a number (e.g. ASV identifiers from QIIME 2).
The latter problem I can easily fix with regex, but the first mentioned problem is more difficult. Patterns that are predictable I can fix provided I know the characters that change to periods. However, since I canβt predict when there is a parenthesis or square bracket, I canβt correct the names.
Sincerely,
Jan
Examples of predictable patterns (e.g. ;p__) and unpredicable patterns:
Contain parenthesis:
d__Bacteria;p__Patescibacteria;c__Gracilibacteria;o__Absconditabacteriales_(SR1);f__Absconditabacteriales_(SR1);g__Absconditabacteriales_(SR1)
Contain square bracket:
d__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__[Eubacterium]_hallii_group
Contain hyphen:
d__Bacteria;p__Firmicutes;c__Negativicutes;o__Veillonellales-Selenomonadales;f__Veillonellaceae;g__Veillonella
sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux 8.7 (Ootpa)