The bioBakery help forum

GSE111889 sample ambiguity

We were interested in your study whose data is deposited at GEO at the following location: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE111889

However, there seems to be a discrepancy with the raw counts that have been deposited, with different count values reported between the 2 files listed:

  • GSE111889_RAW.tar
  • GSE111889_host_tx_counts.tsv.gz

This discrepancy is elaborated in the attached screenshot.

Do you know what may have happened here and which data is correct to use?

My colleague also found some discrepancies in GSE111889_series_matrix.txt , where 4 samples have ’ rectum tissue ’ under ’ Sample_source_name_ch1 '; however, the biopsy location values from ’ Sample_characteristics_ch1 ’ are a different location:

  • ‘GSM3043425’: ‘Ileum’
  • ‘GSM3043517’: ‘Transverse colon’
  • ‘GSM3043535’: ‘Cecum’
  • ‘GSM3043564’: ‘Cecum’

Would it be possible to get correct values for either tissue type or biopsy location?
Also, the sample ‘GSM3043543’ does not describe the location from where the biopsy was taken, being instead described as ‘non-inflamed’ having ‘ !Sample_source_name_ch1 ’ as unspecified ‘tissue’. Would it be possible to clarify?

Hi user,

Thank you for pointing out the glitches in the GEO GSE contents. Apologies for the confusion and it seems that there are some inconsistencies in !Sample_source_name_ch1 field for the GSE111889_series_matrix.txt.

Corrected Sample_source_name_ch1 are posted below:

GSM3043425: 
Sample_source_name_ch: intestinal tissue
biopsy location: ‘Ileum’

GSM3043517: 
Sample_source_name_ch: colon tissue
biopsy location: ‘Transverse colon’

GSM3043535: 
Sample_source_name_ch: intestinal tissue
biopsy location: ‘Cecum’

GSM3043564: 
Sample_source_name_ch1: intestinal tissue
biopsy location: ‘Cecum’

For the non-inflamed GSM3043543 sample:

Sample_source_name_ch1: colon tissue
biopsy location: biopsy was taken from a `non-inflamed site` in the colon

Both the raw and host_tx_counts table seems to have 251 sample counts. Can you give me a little more details on the count’s inconsistency?

  • GSE111889_RAW.tar: 251 sample counts
  • GSE111889_host_tx_counts.tsv.gz: 251 sample counts

Regards,
Sagun

Hi Sagun, thank you for clarifying the metadata. I will pass the information to my colleague.

Regarding the raw counts, the inconsistency seems to be that the sample IDs are mixed up, if you can take a look at the attached output that I produced: