Maaslin2 changes the feature names

Hi,

The Maaslin2 function changes several characters (e.g. semicolon, hyphen, square bracket and parenthesis) in the taxonomic names from Silva-db to a period (i.e. a .). How can this be avoided?

I get problems when I want to compare results from other analysis tools that keep the original names.

Likewise, an X is added to feature names that start with a number (e.g. ASV identifiers from QIIME 2).

The latter problem I can easily fix with regex, but the first mentioned problem is more difficult. Patterns that are predictable I can fix provided I know the characters that change to periods. However, since I can’t predict when there is a parenthesis or square bracket, I can’t correct the names.

Sincerely,
Jan

Examples of predictable patterns (e.g. ;p__) and unpredicable patterns:

Contain parenthesis:
d__Bacteria;p__Patescibacteria;c__Gracilibacteria;o__Absconditabacteriales_(SR1);f__Absconditabacteriales_(SR1);g__Absconditabacteriales_(SR1)

Contain square bracket:
d__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__[Eubacterium]_hallii_group

Contain hyphen:
d__Bacteria;p__Firmicutes;c__Negativicutes;o__Veillonellales-Selenomonadales;f__Veillonellaceae;g__Veillonella

sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux 8.7 (Ootpa)

1 Like

Hi @Jan

Unfortunately this is due to data frames in R not liking columns that start with numbers. Due to this base R automatically adds the character β€œX” infant of any columns that start with a number.

Similarly, R does not like the use of some punctuation in column names so they are replaced by β€œ.”. My suggestion would be to replace all punctuation with β€œ.” in your column names using regex and then doing the same for your original names to match between them.

cheers,
Jacob Nearing

1 Like