Skip to content
GitLab
Projects
Groups
Snippets
Help
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
S
stat_methods_bioinf
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Service Desk
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Operations
Operations
Incidents
Environments
Packages & Registries
Packages & Registries
Container Registry
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Bernd Klaus
stat_methods_bioinf
Commits
efda91cc
Commit
efda91cc
authored
Aug 18, 2017
by
Bernd Klaus
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
introduced NB, mention DESeq2 var stab capabilities
parent
c2e9e1ac
Changes
3
Expand all
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
116 additions
and
77 deletions
+116
-77
graphics_bioinf.R
graphics_bioinf.R
+2
-1
graphics_bioinf.Rmd
graphics_bioinf.Rmd
+47
-14
graphics_bioinf.html
graphics_bioinf.html
+67
-62
No files found.
graphics_bioinf.R
View file @
efda91cc
...
...
@@ -303,7 +303,8 @@ counts_norm_crc_mat <- select(counts_crc_tidy, sample_id, ensembl_id, count_norm
select
(
-
ensembl_id
)
%>%
as.matrix
()
meanSdPlot
(
counts_norm_crc_mat
)
meanSdPlot
(
counts_norm_crc_mat
)
$
gg
+
ylim
(
c
(
0
,
10
))
## ----session_info, cache = FALSE-----------------------------------------
sessionInfo
()
...
...
graphics_bioinf.Rmd
View file @
efda91cc
...
...
@@ -699,31 +699,64 @@ counts_norm_crc_mat <- select(counts_crc_tidy, sample_id, ensembl_id, count_norm
select
(-
ensembl_id
)
%>%
as
.
matrix
()
meanSdPlot
(
counts_norm_crc_mat
)
```
meanSdPlot
(
counts_norm_crc_mat
)
$
gg
+
ylim
(
c
(
0
,
2000
))
```
##
The
Negative
binomial
distribution
for
RNA
--
Seq
data
overdispersion
,
i
.
e
.
the
variance
is
greater
than
than
the
mean
.
In
fact
,
it
is
known
that
a
standard
Poisson
model
can
only
account
for
the
technical
noise
in
RNA
--
Seq
data
.
In
the
Poisson
model
the
variance
is
equal
to
the
mean
,
while
in
RNA
--
Seq
data
the
variance
is
greater
than
the
mean
.
A
popular
model
for
RNA
--
Seq
data
is
the
Negative
--
Binomial
-
distribution
(
NB
),
with
mean
\(
E
(
NB
)
=
\
mu
\)
and
variance
:
A
popular
way
to
model
this
is
to
use
a
Negative
--
Binomial
-
distribution
(
NB
),
which
includes
an
additional
parameter
dispersion
parameter
$\
alpha
$
such
that
$
E
(
NB
)
=
\
mu
$
and
\
begin
{
gather
*}
\
[
\
text
{
Var
[
NB
($\
mu
$,
$\
alpha
$)]}
=
\
mu
+
\
alpha
\
cdot
\
mu
^
2
\
end
{
gather
*}
\]
Hence
,
the
variance
depends
on
the
mean
,
and
the
dispersion
parameter
\(\
alpha
\)
controls
the
relationship
between
mean
and
variance
.
It
has
been
shown
(
Marioni
2008
)
that
in
RNA
--
Seq
data
alpha
is
greater
than
zero
,
unless
there
is
only
technical
noise
This
is
called
overdispersion
,
i
.
e
.
the
variance
is
greater
than
than
the
mean
.
(
For
\(\
alpha
=
0
\),
the
NB
distribution
becomes
the
Poisson
distribution
)
Single
cell
RNA
-
Seq
data
contains
many
zeros
,
due
to
limited
capture
efficiency
,
thus
zero
inflated
models
(
ref
MAST
,
Risso
2016
)
have
been
proposed
to
account
for
this
.
##
Variance
stabilization
by
log
+
pseudocount
A
simple
variance
stabilization
method
for
count
data
is
a
log
+
a
pseudocount
.
Note
that
for
negative
binomially
distributed
data
:
Hence
,
the
variance
is
greater
than
the
mean
.
\
Biocpkg
{
DESeq2
}
uses
the
NB
model
and
fits
dispersion
values
(
see
below
).
\[
\
text
{
Var
(
log
(
counts
+
pseudocount
))}
\
approx
=
\
frac
{
1
}{
counts
+
pseudocount
}
+
\
alpha
\]
where
$\
alpha
$
is
the
dispersion
.
The
formula
shows
that
for
genes
with
a
large
mean
,
the
variance
will
no
longer
depend
on
the
mean
and
essentially
be
equal
to
alpha
since
\(\
frac
{
1
}{
counts
+
pseudocount
}\)
is
very
close
to
zero
.
However
,
adding
pseudocounts
will
artificially
lower
the
the
variance
for
lowly
expressed
genes
,
possibly
leading
to
a
higher
rate
of
false
positive
calls
in
a
differential
expression
analysis
.
In
general
,
are
carefull
modelling
of
the
variance
mean
realtionship
is
preferable
to
ad
--
hoc
solutions
.
The
package
`
r
Biocpkg
(
"DESeq2"
)`
provides
variance
stabilizations
that
are
tailored
to
RNA
--
Seq
data
.
We
use
its
function
`
varianceStabilizingTransformation
`
to
obtain
normalize
and
variance
--
stabilized
count
data
.
```{
r
varStabCountData
}
```
#
Confounding
factors
,
Testing
,
Machine
Learning
,
move
to
new
doc
!
#
move
to
new
doc
:
Confounding
factors
,
Statistical
Testing
,
Machine
Learning
*
ZINBA
Wave
...
...
graphics_bioinf.html
View file @
efda91cc
This diff is collapsed.
Click to expand it.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment