为什么gpl信息里面的物种会错误呢

最近在对GEO数据库的全部GPL平台的芯片探针序列进行批量重新注释的时候,发现如果工具芯片自带的物种信息来自动化选择参考基因组,居然还会出现某个芯片探针比对率非常低的情况, 比如GPL21827这个平台:

60898 reads; of these:
 60898 (100.00%) were unpaired; of these:
 59099 (97.05%) aligned 0 times
 1753 (2.88%) aligned exactly 1 time
 46 (0.08%) aligned >1 times
2.95% overall alignment rate

因为在GEO数据库,它居然被记录为mouse这个物种,但是它明明是human啊!

Agilent-079487 Arraystar Human LncRNA microarray V4 (Probe Name version)
GPL21827
Public on May 07 2016
2016/5/6
2016/5/7
in situ oligonucleotide
Mus musculus
Agilent Technologies

实在是太诡异了:https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL21827

可是我在GEO官网查询它: Agilent-079487 Arraystar Human LncRNA microarray V4 (Probe Name version)

物种又是human。

这并不是唯一的比对率低的情况:

一款circRNA芯片: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL23467

170351 reads; of these:
 170351 (100.00%) were unpaired; of these:
 169391 (99.44%) aligned 0 times
 811 (0.48%) aligned exactly 1 time
 149 (0.09%) aligned >1 times
0.56% overall alignment rate

一款miRNA芯片 : https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL8180

380 reads; of these:
 380 (100.00%) were unpaired; of these:
 215 (56.58%) aligned 0 times
 138 (36.32%) aligned exactly 1 time
 27 (7.11%) aligned >1 times
43.42% overall alignment rate

Comments are closed.