Merge_metaphlan_tables.py with absolute abundance

Hi all!
In case if someone will need it, here is the modified file that merges absolute abundances instead of relative abundances in metaphlan output (you should run metaphlan with ‘-t rel_ab_w_read_stats’ option).
merge_metaphlan_tables_abs.txt (3.6 KB)
Rename it as merge_metaphlan_tables_abs.py and make it executable. Usage is the same as for relative abundance merging.

9 Likes

Thank you very much. I needed this!

1 Like

This worked perfectly, thank you!

1 Like

Thank you very much for this, Tim!

1 Like

Thanks, Timyerg. This is really what I need!
May I know where should we upload this file?
I try to upload it to the files relevant to Metaphlan, but it seems that the command not been found.
Much appreciated it if you could give any suggestions,

You are welcome!

  1. Download the file. Put it anywhere.
  2. Rename it to replace “.txt” extension to “.py”
  3. Make it executable (replace “path_to” with a path to file):
    chmod +x path_to/merge_metaphlan_tables_abs.py
  4. Run it as original script, but provide path_to/merge_metaphlan_tables_abs.py in a command, with a path to the script.

Many thanks for your quick reply! Timyerg. It works well!

It is so nice of you! truely appreciate your work!

Hello, I need this script to merge aboslute values but when I try to change its permission it says

chmod: changing permissions of 'merge_metaphlan_tables_abs.py': Operation not permitted

Please let me know how can I run it. Thanks!

And just ran it as below:

python merge_metaphlan_tables_abs.py */metaphlan4/*_profile.txt > merged_abs.txt

But got this error:

File “/scratch/gencore/novaseq/220702_A00534_0101_BHLFKWDSXX/Unaligned/data/analysis/merge_metaphlan_tables_abs.py”, line 84, in
main()
File “/scratch/gencore/novaseq/220702_A00534_0101_BHLFKWDSXX/Unaligned/data/analysis/merge_metaphlan_tables_abs.py”, line 78, in main
merge(args.aistms, sys.stdout)
File “/scratch/gencore/novaseq/220702_A00534_0101_BHLFKWDSXX/Unaligned/data/analysis/merge_metaphlan_tables_abs.py”, line 42, in merge
iIn = pd.read_csv(f,
File “/scratch/gencore/.eb/2.0/software/metaphlan/4.0.1/lib/python3.10/site-packages/pandas/util/_decorators.py”, line 311, in wrapper
return func(*args, **kwargs)
File “/scratch/gencore/.eb/2.0/software/metaphlan/4.0.1/lib/python3.10/site-packages/pandas/io/parsers/readers.py”, line 678, in read_csv
return _read(filepath_or_buffer, kwds)
File “/scratch/gencore/.eb/2.0/software/metaphlan/4.0.1/lib/python3.10/site-packages/pandas/io/parsers/readers.py”, line 575, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File “/scratch/gencore/.eb/2.0/software/metaphlan/4.0.1/lib/python3.10/site-packages/pandas/io/parsers/readers.py”, line 932, in init
self._engine = self._make_engine(f, self.engine)
File “/scratch/gencore/.eb/2.0/software/metaphlan/4.0.1/lib/python3.10/site-packages/pandas/io/parsers/readers.py”, line 1234, in _make_engine
return mapping[engine](f, **self.options)
File “/scratch/gencore/.eb/2.0/software/metaphlan/4.0.1/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py”, line 146, in init
self._validate_usecols_names(
File “/scratch/gencore/.eb/2.0/software/metaphlan/4.0.1/lib/python3.10/site-packages/pandas/io/parsers/base_parser.py”, line 913, in _validate_usecols_names
raise ValueError(
ValueError: Usecols do not match columns, columns expected but not found: [0, 1, 2, 3, 4]

How can I solve this error, please?

Hello!

You also can change it by clicking on it with the right button of the mouse, put a tick to allow execution as a script.

Regarding the error, I have several questions:

  1. Are you running Metaphlan 3 or Metaphlan 4? I didn’t test it yet with the latest version of Metaphlan, so it can give an error.

  2. Did you run Metaphlan with a flag ‘-t rel_ab_w_read_stats’? If not, then you just don’t have a column with absolute counts.

  3. Did you try to run it without specifying “python” in the command? Just by calling the script directly in the terminal.

Before, I ran without specifying “-t rel_ab_w_read_stats’” .

Later, I added the above flag and ran both metaphlan 3 and 4 but now I get the following error when I run merge function:

Traceback (most recent call last):
File “merge_metaphlan_tables_abs.py”, line 7, in
import pandas as pd
ModuleNotFoundError: No module named ‘pandas’

If I run it without mentioning “python” it gives the following error:

-bash: merge_metaphlan_tables_abs.py: command not found

You need to install a pandas module that used to merge tables.
“conda install -c anaconda pandas” should do the trick.

Here you should indicate the path to the script as well.

Yes, installing pandas solved the issue and I was able to merge tables from metaphlan3 and metaphlan4. Thank you so much.

Hi -
I am trying to merge Metaphlan 4 outputs using the merge_metaphlan_tables_abs.py script. The files were created using the “-t rel_ab_w_read_stats”
I am getting the following error:

Traceback (most recent call last):
File “./merge_metaphlan_tables_abs.py”, line 84, in
main()
File “./merge_metaphlan_tables_abs.py”, line 78, in main
merge(args.aistms, sys.stdout)
File “./merge_metaphlan_tables_abs.py”, line 45, in merge
names = names,
UnboundLocalError: local variable ‘names’ referenced before assignment

Any help would be great!

Hello!
Looks like you are having an issue that described here:

Developers of MetaPhlan already got it fixed, you need to update the script to merge the tables. I am not sure but I think they also added possibility to merge absolute abundances instead of relative ones. My modification of their script was tested only with MetaPhlan3 and outdated. I will run MetaPhlan4 in couple of weeks and if needed will update a script (in case if it is not implemented yet in the original script).

Now this script modification works with Metaphlan4. It is an original script from Metaphlan4, modified to merge “absolute” counts instead of relative ones.

  1. Download the file. Put it anywhere.
    merge_metaphlan_tables_abs.txt (3.3 KB)
  2. Rename it to replace “.txt” extension to “.py”
  3. Make it executable (replace “path_to” with a path to file):
    chmod +x path_to/merge_metaphlan_tables_abs.py
  4. Run it as original script, but provide path_to/merge_metaphlan_tables_abs.py in a command, with a path to the script.
1 Like

Hi Tim,

Just to be sure…This script takes the value for ‘estimated_number_of_reasds_from_the_clade’?
Tbf, I am really confused by what this might exactly mean. Is this really “Absolute counts”? If you try to use these numbers to convert them to relative abundance, it is not in concordance,

Example:

#61414548 reads processed
#SampleID Metaphlan_Analysis
#estimated_reads_mapped_to_known_clades:27677600
#clade_name clade_taxid relative_abundance coverage estimated_number_of_reads_from_the_clade
k__Bacteria 2 100.0 9.90582 27677600
k__Bacteria|p__Firmicutes 2|1239 76.54006 7.58192 16681398
k__Bacteria|p__Proteobacteria 2|1224 20.49451 2.03015 10275637
k__Bacteria|p__Actinobacteria 2|201174 2.5314 0.25076 558110
k__Bacteria|p__Bacteroidetes	2|976	0.29787	0.02951	133225
k__Bacteria|p__Candidatus_Melainabacteria	2|1798710	0.11185	0.01108	23746
k__Bacteria|p__Bacteria_unclassified	2|35929085	0.02431	0.00241	5484

If we take ‘p__Firmicutes’ estimated_number_of_reads_from_the_clade = 16681398. If we divide this by the total of 27677600, the rel ab should be 60.27% and so not 76.5%. :s
However, it does seem as if the “coverage” and relative abundance correlate (meaning the relative abundance might have been calculated using the coverage information?).

I don’t know if I should be asking this to you Tim, but maybe you have a better understanding of this than me? But obviously would be great if someone from the developer team could also chime in.

1 Like

Hello!

That’s right.

I always refer to it as “absolute” abundances in “” because actually they are not exactly the same.
This discussion can answer your question better than I.

1 Like