<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; 统计学</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/%e7%bb%9f%e8%ae%a1%e5%ad%a6/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>quantile normalization到底对数据做了什么？</title>
		<link>http://www.bio-info-trainee.com/2043.html</link>
		<comments>http://www.bio-info-trainee.com/2043.html#comments</comments>
		<pubDate>Wed, 23 Nov 2016 11:48:51 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[生信基础]]></category>
		<category><![CDATA[normalization]]></category>
		<category><![CDATA[quantile]]></category>
		<category><![CDATA[统计学]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2043</guid>
		<description><![CDATA[提到normalization很多人都烦了，几十种方法，而对于芯片或者其它表达数 &#8230; <a href="http://www.bio-info-trainee.com/2043.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>提到normalization很多人都烦了，几十种方法，而对于芯片或者其它表达数据来说，最常见的莫过于quantile normalization啦。那么它到底对我们的表达数据做了什么呢？首先要么要清楚一个概念，表达矩阵的每一列都是一个样本，每一行都是一个基因或者探针，值就是表达量咯。quantile normalization 就是对每列单独进行排序，排好序的矩阵求平均值，得到<strong><span style="color: #ff0000;">平均值向量</span></strong>，然后根据原矩阵的排序情况替换对应的平均值，所以normalization之后的值只有平均值了。具体看下面的图：<span id="more-2043"></span></p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/11/14.png"><img class="alignnone size-full wp-image-2044" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/11/14.png" alt="1" width="595" height="813" /></a></p>
<div>在R里面，推荐用preprocessCore 包来做quantile normalization，不需要自己造轮子啦！</div>
<div></div>
<div>
<div>但是需要明白什么时候该用quantile normalization，什么时候不应该用，就复杂很多了，自己看</p>
<div><a href="http://biorxiv.org/content/biorxiv/early/2014/12/04/012203.full.pdf">http://biorxiv.org/content/biorxiv/early/2014/12/04/012203.full.pdf</a></div>
</div>
</div>
<div><img class="alignnone size-full wp-image-2045" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/11/22.png" alt="2" width="946" height="889" /></div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2043.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>nature发表的统计学专题Statistics in biology</title>
		<link>http://www.bio-info-trainee.com/1047.html</link>
		<comments>http://www.bio-info-trainee.com/1047.html#comments</comments>
		<pubDate>Fri, 16 Oct 2015 11:14:19 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[未分类]]></category>
		<category><![CDATA[nature]]></category>
		<category><![CDATA[统计学]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1047</guid>
		<description><![CDATA[生物学里面，唯一还算有点技术含量，和有点门槛，就是生物统计了，而这也是绝大部分研 &#8230; <a href="http://www.bio-info-trainee.com/1047.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>生物学里面，唯一还算有点技术含量，和有点门槛，就是生物统计了，而这也是绝大部分研究者的痛点，有能力的，可以看看nature上面关于统计学的专题讨论，而且主要是应用于自然科学的统计学讨论。</p>
<div dir="ltr"><a href="http://www.nature.com/collections/qghhqm" target="_blank">http://www.nature.com/collections/qghhqm</a></div>
<div dir="ltr">里面有几句统计学名言警句：</div>
<div dir="ltr">
<div>
<div>Statistics does not tell us whether we are right. It tells us the chances of being wrong.</div>
<div>统计学并不会告诉我们是否正确，而只是说明我们错误的可能性是多少。</div>
<div>Quality is often more important than quantity.</div>
<div>数据的质量远比数量要重要的多</div>
<div>The meaning of error bars is often misinterpreted, as is the statistical significance of their overlap.</div>
<div>Good experimental designs mitigate experimental error and the impact of factors not under study.<span style="color: #333333; font-family: arial,helvetica,ＭＳ Ｐゴシック,ＭＳ ゴシック,Osaka,MS PGothic,sans-serif;"><b><br />
</b></span></div>
<div></div>
</div>
<div>文章列表：</div>
<div>
<div>Research methods: Know when your numbers are significant</div>
<div>Scientific method: Statistical errors</div>
<div>Weak statistical standards implicated in scientific irreproducibility</div>
<div>The fickle P value generates irreproducible results</div>
<div>Vital statistics</div>
<div>Experimental biology: Sometimes Bayesian statistics are better</div>
<div>A call for transparent reporting to optimize the predictive value of preclinical research</div>
<div>Power failure: why small sample size undermines the reliability of neuroscience</div>
<div>Basic statistical analysis in genetic case-control studies</div>
<div>Erroneous analyses of interactions in neuroscience: a problem of significance</div>
<div>Analyzing 'omics data using hierarchical models</div>
<div>Advantages and pitfalls in the application of mixed-model association methods</div>
<div>Quality control and conduct of genome-wide association meta-analyses</div>
<div>Circular analysis in systems neuroscience: the dangers of double dipping</div>
<div>A solution to dependency: using multilevel analysis to accommodate nested data</div>
<div>How does multiple testing correction work?</div>
<div>What is Bayesian statistics?</div>
<div>What is a hidden Markov model?</div>
<div>下面的这些文章，其实就是我们正常课本里面统计学的知识点，但是放在nature杂志发表，就顿时高大上了好多</div>
<div>Points of significance: Importance of being uncertain</div>
<div>Points of Significance: Error bars</div>
<div>Points of significance: Significance, P values and t-tests</div>
<div>Points of significance: Power and sample size</div>
<div>Points of Significance: Visualizing samples with box plots</div>
<div>Points of significance: Comparing samples part I</div>
<div>Points of significance: Comparing samples part II</div>
<div>Points of significance:  Nonparametric tests</div>
<div>Points of significance: Designing comparative experiments</div>
<div>Points of significance: Analysis of variance and blocking</div>
<div>Points of Significance:  Replication</div>
<div>Points of Significance:  Nested designs</div>
<div>Points of Significance: Two-factor designs</div>
<div>Points of significance: Sources of variation</div>
<div>Points of Significance: Split plot design</div>
<div>Points of Significance: Bayes' theorem</div>
<div>Points of significance: Bayesian statistics</div>
<div>Points of Significance: Sampling distributions and the bootstrap</div>
<div>Points of Significance: Bayesian networks</div>
<div></div>
<div>A study with low statistical power has a reduced chance of detecting a true effect, but it is less well appreciated that low power also reduces the likelihood that a statistically significant result reflects a true effect. Here, we show that the average statistical power of studies in the neurosciences is very low. The consequences of this include overestimates of effect size and low reproducibility of results. There are also ethical dimensions to this problem, as unreliable research is inefficient and wasteful. Improving reproducibility in neuroscience is a key priority and requires attention to well-established but often ignored methodological principles.</div>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1047.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
