sam转为bam文件报错

怕什么报错呢,重来一次不就好了吗!

学员群有人提问,他们上完了我们的转录组课程,自己拿服务器去跑一个文献数据,有一个样本的sam转为bam文件报错,得到文件如下:

213M 2月 20 20:59 SRR10574404.hisat.bam.tmp.0000.bam
211M 2月 20 20:59 SRR10574404.hisat.bam.tmp.0001.bam
213M 2月 20 20:59 SRR10574404.hisat.bam.tmp.0002.bam
232M 2月 20 20:59 SRR10574404.hisat.bam.tmp.0003.bam
242M 2月 20 21:00 SRR10574404.hisat.bam.tmp.0004.bam
207M 2月 20 21:00 SRR10574404.hisat.bam.tmp.0005.bam
208M 2月 20 21:00 SRR10574404.hisat.bam.tmp.0006.bam
211M 2月 20 21:00 SRR10574404.hisat.bam.tmp.0007.bam
216M 2月 20 21:00 SRR10574404.hisat.bam.tmp.0008.bam
227M 2月 20 21:00 SRR10574404.hisat.bam.tmp.0009.bam
206M 2月 20 21:00 SRR10574404.hisat.bam.tmp.0010.bam
199M 2月 20 21:00 SRR10574404.hisat.bam.tmp.0011.bam
202M 2月 20 21:00 SRR10574404.hisat.bam.tmp.0012.bam
203M 2月 20 21:00 SRR10574404.hisat.bam.tmp.0013.bam
207M 2月 20 21:01 SRR10574404.hisat.bam.tmp.0014.bam
214M 2月 20 21:01 SRR10574404.hisat.bam.tmp.0015.bam
193M 2月 20 21:01 SRR10574404.hisat.bam.tmp.0016.bam
200M 2月 20 21:01 SRR10574404.hisat.bam.tmp.0017.bam
206M 2月 20 21:01 SRR10574404.hisat.bam.tmp.0018.bam
205M 2月 20 21:01 SRR10574404.hisat.bam.tmp.0019.bam
229M 2月 20 21:01 SRR10574404.hisat.bam.tmp.0020.bam
206M 2月 20 21:01 SRR10574404.hisat.bam.tmp.0021.bam
210M 2月 20 21:01 SRR10574404.hisat.bam.tmp.0022.bam
212M 2月 20 21:01 SRR10574404.hisat.bam.tmp.0023.bam
218M 2月 20 21:02 SRR10574404.hisat.bam.tmp.0024.bam
238M 2月 20 21:02 SRR10574404.hisat.bam.tmp.0025.bam
210M 2月 20 21:02 SRR10574404.hisat.bam.tmp.0026.bam
207M 2月 20 21:02 SRR10574404.hisat.bam.tmp.0027.bam
209M 2月 20 21:02 SRR10574404.hisat.bam.tmp.0028.bam
213M 2月 20 21:02 SRR10574404.hisat.bam.tmp.0029.bam
 36G 2月 20 00:19 SRR10574404.hisat.sam

仔细看了看报错信息,如下:

[E::sam_parse1] incomplete aux field
samtools sort: truncated file. Aborting

它的比对情况也非常好;

48793007 reads; of these:
 48793007 (100.00%) were paired; of these:
 6146709 (12.60%) aligned concordantly 0 times
 39739427 (81.44%) aligned concordantly exactly 1 time
 2906871 (5.96%) aligned concordantly >1 times
 ----
 6146709 pairs aligned concordantly 0 times; of these:
 377273 (6.14%) aligned discordantly 1 time
 ----
 5769436 pairs aligned 0 times concordantly or discordantly; of these:
 11538872 mates make up the pairs; of these:
 7438164 (64.46%) aligned 0 times
 3539899 (30.68%) aligned exactly 1 time
 560809 (4.86%) aligned >1 times
92.38% overall alignment rate

没道理啊,我只能说建议他重新跑一下比对流程拿到全新sam文件:

id=SRR10574404
indexPrefix=/home/data/server/reference/index/hisat/hg38/genome 
nohup hisat2 -p 1 -x $indexPrefix -1 ${id}*_1_val_1.fq.gz -2 ${id}*_2_val_2.fq.gz -S ${id}.hisat.sam & 


# sam文件转bam
ls *.sam|while read id ;do (nohup samtools sort -O bam -@ 2 -o $(basename ${id} ".sam").bam ${id} & );done
# 这个过程会输出大量中间文件
rm *.sam 
# 为bam文件建立索引
ls *.bam |xargs -i samtools index {}

这次就成功了:

7.2G 2月 21 15:42 SRR10574404.hisat.bam
36G 2月 21 12:27 SRR10574404.hisat.sam

没有输出那么多临时文件然后卡死。

Comments are closed.