Feature request: Include taxonomy columns for baqlava_join_table utility script

Hi,

Thank you for creating the baqlava_join_table utility function. Unfortunately, it was not was I was expecting. What I get is just the VGB column and then all the samples’ RPKs. Is there a reason the the taxonomy columns are not included in the joined table? As it is, I’m still parsing through the individual sample files so I know what species each VGB is from. If I have time over the holiday break, I’ll take a look at the code and see if I can do a pull request.

viral_benchmarking_BAQLaVa_VGB_table.tsv (4.6 KB)

Hi @scottdaniel_at_chop!
The intent of the merge tables function is to produce a table with one column per sample and one row per VGB with a single VGB abundance - a common format for this type of data, particularly regarding compatibility with other bioBakery tools. (Another trade-off made here you’ll notice are dropping the nucleotide and translated subset abundances.)
If you need a merged table with the taxonomy information as well, you can use the file located at master/baqlava/utility_files/VGB_taxonomy.txt to do this in a single merge, rather than pulling individually from each sample’s profile. Please note to merge on the ‘segment group’ column to account for viruses with segmented genomes.

Best, Jordan