Command for prepring species abundance table

DEEPCHANDA7 · June 6, 2020, 2:24pm

Hi friends!!! @fbeghini
I’m new in linux command and just curious to know whether this command (while preparing species abundance file)

grep -E “(s__)|(^ID)” merged_abundance_table.txt | grep -v “t__” | sed ‘s/^.*s__//g’ > merged_abundance_table_species.txt

be replaced by the following command:

grep “(^ID)|(s__)” merged_abundance_table.txt | grep -v “t__” | sed “s/.*s__//g” > merged_abundance_table_species.txt

Thanks,
DC7

fbeghini · June 8, 2020, 7:52am

grep -E allows the usage of extended regex with grep, including the usage of the pipe character. If you do not specify the -E option, grep will retrieve any lines.

DEEPCHANDA7 · June 8, 2020, 9:50am

Oh… I missed the ‘E’… Even if I add ‘E’ still there I’ve shuffled some characters. Like, removal of '^' after ‘sed’ and alterations of '(^ID)' and '(s__)'… Do you think the following two will result into same output?

My Command:

grep -E “(^ID)|(s__)” merged_abundance_table.txt | grep -v “t__” | sed “s/.*s__//g” > merged_abundance_table_species.txt

Tutorial command:

grep -E “(s__)|(^ID)” merged_abundance_table.txt | grep -v “t__” | sed ‘s/^.*s__//g’ > merged_abundance_table_species.txt

Thanks,
DC7

fbeghini · June 8, 2020, 11:37am

Shuffing (s__) and (^ID) do not change the output, in both case grep will report the header and the species list in the same order.
Removing the caret, in this case, will allow matching all the text present in all the columns and not just on the one which contains the start of the line.

DEEPCHANDA7 · June 8, 2020, 11:40am

Thanks a lot @fbeghini. one more thing…
Somewhere I’ve seen that “–ncores” option doesn’t work when we use a zipped file (e.g. fastq.gz, .bz2, etc.). But I’ve used the “–ncores” option as “–ncores 16” and also got profile output. May there be any mistake in my profile output due to that “–ncores” option? Is the following considered a wrong command?

 metaphlan catenated.fastq.gz --input_type fastq --nproc 16 > catenated_profile.txt

NOTE: I'm using MetaPhlAn 3.0

fbeghini · June 9, 2020, 11:09am

What do you mean with “–ncores” option doesn’t work ?

DEEPCHANDA7 · June 9, 2020, 1:32pm

I have found the following lines from this link:
"Notice we are piping the fastq reads directly from the compressed archive to MetaPhlAn. When piping MetaPhlAn's internal parallelization option (--nproc option) can not be used. This is available when the input is a an uncompressed file."

It means --nproc option will not work when we use a zipped file. But for my analysis I’ve used --nproc option as in following command and got profile output. So, do you think there may be any mistake in my output as i have used --nproc option with zipped input file?

metaphlan catenated.fastq.gz --input_type fastq --nproc 16 > catenated_profile.txt

NOTE: I'm using MetaPhlAn 3.0

Thanks and regards,
DC7

fbeghini · June 9, 2020, 1:49pm

The page you linked is the manual for MetaPhlAn 1, not MetaPhlAn 3.0. Starting from version 2, the parallelization works with both compressed and uncompressed input files.

Topic		Replies	Views
Phylum only abundance table MetaPhlAn	4	822	September 24, 2020
Grepping genus abundance table MetaPhlAn	1	277	September 22, 2023
Merge_metaphlan_tables.py gives different header in Metaphlan4 MetaPhlAn	8	521	February 28, 2023
How to create absolute abundance tables at family, genus, class and phylum levels MetaPhlAn	3	828	September 22, 2023
How to use metaphlan output MetaPhlAn	1	71	February 2, 2025

Command for prepring species abundance table

Related topics