Hello,
I processed my dataset by MetaPhlAn 4.0.3, and tried to merge the profiled output files by using merge_metaphlan_table.py
with --gtdb_profiles
argument.
This is command I entered.
/BiO/Access/home/hjy/miniconda3/envs/biobakery/lib/python3.7/site-packages/metaphlan/utils/merge_metaphlan_tables.py \
/BiO5/TBD220344/metaphlan_no9/*profiled.mpa4.tsv \
-o /BiO5/TBD220344/metaphlan_no9/no9_merged.mpa4.tsv \
--gtdb_profiles
Then, I got the error message returned.
Traceback (most recent call last):
File "/BiO/Access/home/hjy/miniconda3/envs/biobakery/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'relative_abundance'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/BiO/Access/home/hjy/miniconda3/envs/biobakery/lib/python3.7/site-packages/metaphlan/utils/merge_metaphlan_tables.py", line 77, in <module>
main()
File "/BiO/Access/home/hjy/miniconda3/envs/biobakery/lib/python3.7/site-packages/metaphlan/utils/merge_metaphlan_tables.py", line 73, in main
merge(args.aistms, open(args.o, 'w') if args.o else sys.stdout, args.gtdb_profiles)
File "/BiO/Access/home/hjy/miniconda3/envs/biobakery/lib/python3.7/site-packages/metaphlan/utils/merge_metaphlan_tables.py", line 36, in merge
profiles_list.append(pd.Series(data=iIn['relative_abundance'], index=iIn.index,
File "/BiO/Access/home/hjy/miniconda3/envs/biobakery/lib/python3.7/site-packages/pandas/core/frame.py", line 3458, in __getitem__
indexer = self.columns.get_loc(key)
File "/BiO/Access/home/hjy/miniconda3/envs/biobakery/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: 'relative_abundance'
This is header of metaphlan output files.
#mpa_vJan21_CHOCOPhlAnSGB_202103
#/BiO/Access/home/hjy/miniconda3/envs/biobakery/bin/metaphlan /BiO5/TBD220344/kneaddata_out/stool_SG_001/stool_SG_001_R1_kneaddata_paired_1.fastq,/BiO5/TBD220344/kneaddata_out/stool_SG_001/stool_SG_001_R1_kneaddata_paired_2.fastq --input_type fastq --bowtie2db /BiO/BioResources/DBs/Biobakery/CHOCOPhlAnSGB_202103/ --index mpa_vJan21_CHOCOPhlAnSGB_202103 --bowtie2out /BiO5/TBD220344/bowtie2_out/stool_SG_001.mpa4.bowtie2.bz2 --output_file /BiO5/TBD220344/metaphlan_out/stool_SG_001_profiled.mpa4.tsv --nproc 36 --offline
#61920632 reads processed
#SampleID Metaphlan_Analysis
#clade_name NCBI_tax_id relative_abundance additional_species
And this is the last line of metaphlan output file.
k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Lachnospiraceae|g__Lachnospiraceae_unclassified|s__Lachnospiraceae_bacterium|t__SGB4782 2|1239|186801|186802|186803||1898203| 4e-05
I don’t understand what is causing the error.
How should I overcome this error?
Please advise.
Sincerely,
Kirby