However, there seems to be a discrepancy with the raw counts that have been deposited, with different count values reported between the 2 files listed:
GSE111889_RAW.tar
GSE111889_host_tx_counts.tsv.gz
This discrepancy is elaborated in the attached screenshot.
Do you know what may have happened here and which data is correct to use?
My colleague also found some discrepancies in GSE111889_series_matrix.txt , where 4 samples have ’ rectum tissue ’ under ’ Sample_source_name_ch1 '; however, the biopsy location values from ’ Sample_characteristics_ch1 ’ are a different location:
‘GSM3043425’: ‘Ileum’
‘GSM3043517’: ‘Transverse colon’
‘GSM3043535’: ‘Cecum’
‘GSM3043564’: ‘Cecum’
Would it be possible to get correct values for either tissue type or biopsy location?
Also, the sample ‘GSM3043543’ does not describe the location from where the biopsy was taken, being instead described as ‘non-inflamed’ having ‘ !Sample_source_name_ch1 ’ as unspecified ‘tissue’. Would it be possible to clarify?
Thank you for pointing out the glitches in the GEO GSE contents. Apologies for the confusion and it seems that there are some inconsistencies in !Sample_source_name_ch1 field for the GSE111889_series_matrix.txt.
Corrected Sample_source_name_ch1 are posted below:
Hi Sagun, thank you for clarifying the metadata. I will pass the information to my colleague.
Regarding the raw counts, the inconsistency seems to be that the sample IDs are mixed up, if you can take a look at the attached output that I produced: