Hi friends!!! @fbeghini
I’m new in linux command and just curious to know whether this command (while preparing species abundance file)
grep -E “(s__)|(^ID)” merged_abundance_table.txt | grep -v “t__” | sed ‘s/^.*s__//g’ > merged_abundance_table_species.txt
be replaced by the following command:
grep “(^ID)|(s__)” merged_abundance_table.txt | grep -v “t__” | sed “s/.*s__//g” > merged_abundance_table_species.txt
Thanks,
DC7
grep -E
allows the usage of extended regex with grep
, including the usage of the pipe character. If you do not specify the -E
option, grep
will retrieve any lines.
Oh… I missed the ‘E’… Even if I add ‘E’ still there I’ve shuffled some characters. Like, removal of '^'
after ‘sed
’ and alterations of '(^ID)'
and '(s__)'
… Do you think the following two will result into same output?
My Command:
grep -E “(^ID)|(s__)” merged_abundance_table.txt | grep -v “t__” | sed “s/.*s__//g” > merged_abundance_table_species.txt
Tutorial command:
grep -E “(s__)|(^ID)” merged_abundance_table.txt | grep -v “t__” | sed ‘s/^.*s__//g’ > merged_abundance_table_species.txt
Thanks,
DC7
Shuffing (s__) and (^ID) do not change the output, in both case grep will report the header and the species list in the same order.
Removing the caret, in this case, will allow matching all the text present in all the columns and not just on the one which contains the start of the line.
1 Like
Thanks a lot @fbeghini. one more thing…
Somewhere I’ve seen that “–ncores” option doesn’t work when we use a zipped file (e.g. fastq.gz, .bz2, etc.). But I’ve used the “–ncores” option as “–ncores 16” and also got profile output. May there be any mistake in my profile output due to that “–ncores” option? Is the following considered a wrong command?
metaphlan catenated.fastq.gz --input_type fastq --nproc 16 > catenated_profile.txt
NOTE: I'm using MetaPhlAn 3.0
What do you mean with “–ncores” option doesn’t work ?
I have found the following lines from this link:
"Notice we are piping the fastq reads directly from the compressed archive to MetaPhlAn. When piping MetaPhlAn's internal parallelization option (--nproc option) can not be used. This is available when the input is a an uncompressed file."
It means --nproc option will not work when we use a zipped file. But for my analysis I’ve used --nproc
option as in following command and got profile output. So, do you think there may be any mistake in my output as i have used --nproc
option with zipped
input file?
metaphlan catenated.fastq.gz --input_type fastq --nproc 16 > catenated_profile.txt
NOTE: I'm using MetaPhlAn 3.0
Thanks and regards,
DC7
The page you linked is the manual for MetaPhlAn 1, not MetaPhlAn 3.0. Starting from version 2, the parallelization works with both compressed and uncompressed input files.
1 Like