Merge_metaphlan_table.py Error

Hello,

I processed my dataset by MetaPhlAn 4.0.3, and tried to merge the profiled output files by using merge_metaphlan_table.py with --gtdb_profiles argument.

This is command I entered.

/BiO/Access/home/hjy/miniconda3/envs/biobakery/lib/python3.7/site-packages/metaphlan/utils/merge_metaphlan_tables.py \
/BiO5/TBD220344/metaphlan_no9/*profiled.mpa4.tsv \
-o /BiO5/TBD220344/metaphlan_no9/no9_merged.mpa4.tsv \
--gtdb_profiles

Then, I got the error message returned.

Traceback (most recent call last):
  File "/BiO/Access/home/hjy/miniconda3/envs/biobakery/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'relative_abundance'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/BiO/Access/home/hjy/miniconda3/envs/biobakery/lib/python3.7/site-packages/metaphlan/utils/merge_metaphlan_tables.py", line 77, in <module>
    main()
  File "/BiO/Access/home/hjy/miniconda3/envs/biobakery/lib/python3.7/site-packages/metaphlan/utils/merge_metaphlan_tables.py", line 73, in main
    merge(args.aistms, open(args.o, 'w') if args.o else sys.stdout, args.gtdb_profiles)
  File "/BiO/Access/home/hjy/miniconda3/envs/biobakery/lib/python3.7/site-packages/metaphlan/utils/merge_metaphlan_tables.py", line 36, in merge
    profiles_list.append(pd.Series(data=iIn['relative_abundance'], index=iIn.index,
  File "/BiO/Access/home/hjy/miniconda3/envs/biobakery/lib/python3.7/site-packages/pandas/core/frame.py", line 3458, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/BiO/Access/home/hjy/miniconda3/envs/biobakery/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: 'relative_abundance'

This is header of metaphlan output files.

#mpa_vJan21_CHOCOPhlAnSGB_202103
#/BiO/Access/home/hjy/miniconda3/envs/biobakery/bin/metaphlan /BiO5/TBD220344/kneaddata_out/stool_SG_001/stool_SG_001_R1_kneaddata_paired_1.fastq,/BiO5/TBD220344/kneaddata_out/stool_SG_001/stool_SG_001_R1_kneaddata_paired_2.fastq --input_type fastq --bowtie2db /BiO/BioResources/DBs/Biobakery/CHOCOPhlAnSGB_202103/ --index mpa_vJan21_CHOCOPhlAnSGB_202103 --bowtie2out /BiO5/TBD220344/bowtie2_out/stool_SG_001.mpa4.bowtie2.bz2 --output_file /BiO5/TBD220344/metaphlan_out/stool_SG_001_profiled.mpa4.tsv --nproc 36 --offline
#61920632 reads processed
#SampleID       Metaphlan_Analysis
#clade_name     NCBI_tax_id     relative_abundance      additional_species

And this is the last line of metaphlan output file.

k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Lachnospiraceae|g__Lachnospiraceae_unclassified|s__Lachnospiraceae_bacterium|t__SGB4782   2|1239|186801|186802|186803||1898203|   4e-05

I don’t understand what is causing the error.
How should I overcome this error?
Please advise.

Sincerely,
Kirby

I missed merge_metaphlan_table.py with --gtdb_profiles argument must be ran after converting sgb to gtdb for every each profiles.:sweat_smile:

All issues are cleared.

Pandas KeyError occurs when we try to access some column/row label in our DataFrame that doesn’t exist. Usually, this error occurs when you misspell a column/row name or include an unwanted space before or after the column/row name… Before doing anything with the data frame, use print(df.columns) to see dataframe column exist or not.

print(df.columns)

I was getting a similar kind of error in one of my codes. Turns out, that particular index was missing from my data frame as I had dropped the empty dataframe 2 rows. If this is the case, you can do df.reset_index(inplace=True) and the error should be resolved.