The Maaslin2 function changes several characters (e.g. semicolon, hyphen, square bracket and parenthesis) in the taxonomic names from Silva-db to a period (i.e. a .). How can this be avoided?

I get problems when I want to compare results from other analysis tools that keep the original names.

Likewise, an X is added to feature names that start with a number (e.g. ASV identifiers from QIIME 2).

The latter problem I can easily fix with regex, but the first mentioned problem is more difficult. Patterns that are predictable I can fix provided I know the characters that change to periods. However, since I can’t predict when there is a parenthesis or square bracket, I can’t correct the names.


Examples of predictable patterns (e.g. ;p__) and unpredicable patterns:

Contain parenthesis:

Contain square bracket:

Contain hyphen:

R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux 8.7 (Ootpa)

Hi @Jan

Unfortunately this is due to data frames in R not liking columns that start with numbers. Due to this base R automatically adds the character β€œX” infant of any columns that start with a number.

Similarly, R does not like the use of some punctuation in column names so they are replaced by β€œ.”. My suggestion would be to replace all punctuation with β€œ.” in your column names using regex and then doing the same for your original names to match between them.

Jacob Nearing

