Command for prepring species abundance table

Hi friends!!! @fbeghini
I’m new in linux command and just curious to know whether this command (while preparing species abundance file)

grep -E “(s__)|(^ID)” merged_abundance_table.txt | grep -v “t__” | sed ‘s/^.*s__//g’ > merged_abundance_table_species.txt

be replaced by the following command:

grep “(^ID)|(s__)” merged_abundance_table.txt | grep -v “t__” | sed “s/.*s__//g” > merged_abundance_table_species.txt

Thanks,
DC7

grep -E allows the usage of extended regex with grep, including the usage of the pipe character. If you do not specify the -E option, grep will retrieve any lines.

Oh… I missed the ‘E’… Even if I add ‘E’ still there I’ve shuffled some characters. Like, removal of '^' after ‘sed’ and alterations of '(^ID)' and '(s__)'… Do you think the following two will result into same output?

My Command:

grep -E “(^ID)|(s__)” merged_abundance_table.txt | grep -v “t__” | sed “s/.*s__//g” > merged_abundance_table_species.txt

Tutorial command:

grep -E “(s__)|(^ID)” merged_abundance_table.txt | grep -v “t__” | sed ‘s/^.*s__//g’ > merged_abundance_table_species.txt

Thanks,
DC7

Shuffing (s__) and (^ID) do not change the output, in both case grep will report the header and the species list in the same order.
Removing the caret, in this case, will allow matching all the text present in all the columns and not just on the one which contains the start of the line.

1 Like

Thanks a lot @fbeghini. one more thing…
Somewhere I’ve seen that “–ncores” option doesn’t work when we use a zipped file (e.g. fastq.gz, .bz2, etc.). But I’ve used the “–ncores” option as “–ncores 16” and also got profile output. May there be any mistake in my profile output due to that “–ncores” option? Is the following considered a wrong command?

 metaphlan catenated.fastq.gz --input_type fastq --nproc 16 > catenated_profile.txt

NOTE: I'm using MetaPhlAn 3.0

What do you mean with “–ncores” option doesn’t work ?

I have found the following lines from this link:
"Notice we are piping the fastq reads directly from the compressed archive to MetaPhlAn. When piping MetaPhlAn's internal parallelization option (--nproc option) can not be used. This is available when the input is a an uncompressed file."

It means --nproc option will not work when we use a zipped file. But for my analysis I’ve used --nproc option as in following command and got profile output. So, do you think there may be any mistake in my output as i have used --nproc option with zipped input file?

metaphlan catenated.fastq.gz --input_type fastq --nproc 16 > catenated_profile.txt

NOTE: I'm using MetaPhlAn 3.0

Thanks and regards,
DC7

The page you linked is the manual for MetaPhlAn 1, not MetaPhlAn 3.0. Starting from version 2, the parallelization works with both compressed and uncompressed input files.

1 Like