Skip to content
GitLab
Projects
Groups
Snippets
Help
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
S
stat_methods_bioinf
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Service Desk
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Operations
Operations
Incidents
Environments
Packages & Registries
Packages & Registries
Container Registry
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Bernd Klaus
stat_methods_bioinf
Commits
4bd3e6ac
Commit
4bd3e6ac
authored
Sep 22, 2017
by
Bernd Klaus
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
refined LOESS section, added background on normalization
parent
a5082e43
Changes
4
Expand all
Hide whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
420 additions
and
336 deletions
+420
-336
graphics_bioinf.R
graphics_bioinf.R
+43
-22
graphics_bioinf.Rmd
graphics_bioinf.Rmd
+130
-72
graphics_bioinf.html
graphics_bioinf.html
+247
-242
graphics_bioinf.pdf
graphics_bioinf.pdf
+0
-0
No files found.
graphics_bioinf.R
View file @
4bd3e6ac
...
...
@@ -36,7 +36,7 @@ library("vsn")
library
(
"matrixStats"
)
library
(
"pheatmap"
)
library
(
"tidyverse"
)
#library("printr")
theme_set
(
theme_gray
(
base_size
=
18
))
...
...
@@ -94,13 +94,21 @@ load(file.path(data_dir, "mtec_cell_anno.RData"))
load
(
file.path
(
data_dir
,
"mtec_gene_anno.RData"
))
load
(
file.path
(
data_dir
,
"tras.RData"
))
## ----mtec_count_table
----------------
------------------------------------
## ----mtec_count_table
, results="asis"
------------------------------------
mtec_counts
## ----mtec_cell_anno, results="asis"--------------------------------------
mtec_cell_anno
## ----trasENSEMBL, results="asis"-----------------------------------------
tras
mtec_gene_anno
## ----tidy_count----------------------------------------------------------
mtec_counts_tidy
<-
gather
(
mtec_counts
,
key
=
"cell_id"
,
value
=
"count"
,
-
ensembl_id
)
%>%
dplyr
::
mutate
(
is_tra
=
ensembl_id
%in%
tras
$
gene.ids
,
dplyr
::
mutate
(
is_tra
=
ensembl_id
%in%
tras
$
gene.ids
,
is_detected
=
count
>
0
)
%>%
left_join
(
mtec_cell_anno
,
by
=
c
(
"cell_id"
=
"cellID"
))
...
...
@@ -128,7 +136,7 @@ scatter_tra
scatter_tra
+
geom_rug
(
alpha
=
I
(
0.2
))
## ----ex_geom
------------
-------------------------------------------------
## ----ex_geom
, echo=FALSE
-------------------------------------------------
ggplot
(
tra_detected
,
aes
(
x
=
total_detected
,
y
=
tra
))
scatter_tra
+
...
...
@@ -140,33 +148,46 @@ lm_tra <- lm(tra ~ total_detected, data = tra_detected)
lm_tra
## ----loessExampleLinFit, dependson="fit_model"---------------------------
sim_data
<-
tibble
(
x
=
seq
(
from
=
1
,
to
=
10
,
length.out
=
100
),
y
=
x
^
3
+
x
^
2
+
rnorm
(
100
,
mean
=
0
,
sd
=
60
))
# create_data
y
<-
seq
(
from
=
1
,
to
=
10
,
length.out
=
100
)
a
<-
y
^
3
+
y
^
2
+
rnorm
(
100
,
mean
=
0
,
sd
=
30
)
dataL
<-
data.frame
(
a
=
a
,
y
=
y
)
qplot
(
y
,
a
,
data
=
dataL
)
ggplot
(
aes
(
x
,
y
),
data
=
sim_data
)
+
geom_point
()
+
geom_smooth
(
method
=
"lm"
)
# linear fit
linreg
<-
lm
(
a
~
y
,
data
=
dataL
)
## ----loessExampleFit, dependson="loessExampleLinFit"---------------------
(
qplot
(
y
,
a
,
data
=
dataL
)
+
geom_abline
(
slope
=
tidy
(
linreg
,
quick
=
TRUE
)[
2
,
2
],
intercept
=
tidy
(
linreg
,
quick
=
TRUE
)[
1
,
2
]))
tidy
(
linreg
)
sim_data
$
locFit
<-
predict
(
locfit
(
y
~
lp
(
x
,
nn
=
0.5
,
deg
=
1
),
data
=
sim_data
),
newdata
=
sim_data
$
x
)
dataL
$
LinReg
<-
predict
(
linreg
)
ggplot
(
aes
(
x
,
y
),
data
=
sim_data
)
+
geom_point
()
+
geom_smooth
(
method
=
"lm"
)
+
ggtitle
(
"Linear vs. local regression"
)
+
geom_line
(
aes
(
x
=
x
,
y
=
locFit
),
color
=
"coral3"
)
## ----loessExampleFit, dependson="loessExampleLinFit"---------------------
dataL
$
locFit
<-
predict
(
locfit
(
y
~
lp
(
a
,
nn
=
0.5
,
deg
=
1
),
data
=
dataL
),
newdata
=
dataL
$
a
)
## ----loessExercise, dependson="loessExampleLinFit", echo=FALSE-----------
sim_data
$
locFit
<-
predict
(
locfit
(
y
~
lp
(
x
,
nn
=
0.1
,
deg
=
1
),
data
=
sim_data
),
newdata
=
sim_data
$
x
)
ggplot
(
aes
(
x
,
y
),
data
=
sim_data
)
+
geom_point
()
+
geom_smooth
()
+
ggtitle
(
"Linear vs. local regression"
)
+
geom_line
(
aes
(
x
=
x
,
y
=
locFit
),
color
=
"coral3"
)
## ----compCells-----------------------------------------------------------
(
qplot
(
a
,
y
,
data
=
dataL
,
main
=
"Linear vs. local regression"
)
+
geom_line
(
aes
(
x
=
a
,
y
=
locFit
),
color
=
"dodgerblue3"
)
+
geom_line
(
aes
(
x
=
a
,
y
=
LinReg
),
color
=
"coral3"
))
ggplot
(
aes
(
x
=
cell_id
,
y
=
log
(
count
)),
data
=
filter
(
mtec_counts_tidy
,
cell_id
%in%
c
(
"cell77"
,
"cell6S35"
),
is_detected
==
TRUE
))
+
geom_boxplot
()
## ----import_crc----------------------------------------------------------
...
...
graphics_bioinf.Rmd
View file @
4bd3e6ac
...
...
@@ -9,12 +9,19 @@ output:
self_contained
:
true
toc_float
:
false
code_download
:
true
df_print
:
paged
BiocStyle
::
pdf_document2
:
toc
:
true
highlight
:
tango
df_print
:
paged
---
<
!--<
To
compile
this
document
graphics
.
off
();
rm
(
list
=
ls
());
rmarkdown
::
render
(
'graphics_bioinf.Rmd'
);
purl
(
'graphics_bioinf.Rmd'
)
graphics
.
off
();
rm
(
list
=
ls
());
rmarkdown
::
render
(
'graphics_bioinf.Rmd'
,
output_format
=
"all"
);
purl
(
'graphics_bioinf.Rmd'
)
rmarkdown
::
render
(
'graphics_bioinf.Rmd'
,
output_format
=
BiocStyle
::
pdf_document2
())
-->
```{
r
options
,
include
=
FALSE
}
...
...
@@ -64,7 +71,7 @@ library("vsn")
library
(
"matrixStats"
)
library
(
"pheatmap"
)
library
(
"tidyverse"
)
#
library
(
"printr"
)
theme_set
(
theme_gray
(
base_size
=
18
))
...
...
@@ -124,36 +131,47 @@ save(tras, file = file.path(data_dir, "tras.RData"),
#
mTEC
single
cell
-
RNA
-
Seq
data
```{
r
import_data
}
load
(
file
.
path
(
data_dir
,
"mtec_counts.RData"
))
load
(
file
.
path
(
data_dir
,
"mtec_cell_anno.RData"
))
load
(
file
.
path
(
data_dir
,
"mtec_gene_anno.RData"
))
load
(
file
.
path
(
data_dir
,
"tras.RData"
))
```
We
summarize
single
-
cell
transcriptomes
in
the
form
of
count
matrices
in
which
each
row
represents
a
gene
and
each
column
represents
one
cell
.
The
matrix
is
filled
with
the
number
of
sequenced
fragments
whose
genomic
alignment
overlaps
with
the
genomic
coordinates
of
the
genes
.
For
this
,
we
only
use
cells
for
which
more
than
40
%
of
the
sequenced
fragments
could
be
uniquely
assigned
to
the
Mouse
reference
genome
.
We
also
discarded
28
cells
from
the
batch
4
(
since
they
were
from
another
cell
type
that
was
sequenced
together
with
some
of
the
mTECs
).
It
is
sensible
to
only
use
highly
variable
genes
in
the
analysis
of
single
cell
data
(
HVG
),
the
original
publication
identified
more
than
9
,
000
as
highly
variable
using
a
method
published
in
Brennecke
et
.
al
.
2013.
We
retain
those
genes
for
further
analysis
.
genomic
coordinates
of
the
genes
.
It
looks
like
this
:
We
also
import
a
table
with
genes
annotated
as
tissue
restricted
antigens
(
tras
)
from
Shinto
et
.
al
.
2013
and
annotation
for
mouse
genes
from
ENSEMBL
.
```{
r
mtec_count_table
,
results
=
"asis"
}
mtec_counts
```
The
annotation
of
the
cells
is
stored
in
a
table
called
`
mtec_cell_anno
`
```{
r
import_data
}
load
(
file
.
path
(
data_dir
,
"mtec_counts.RData"
))
load
(
file
.
path
(
data_dir
,
"mtec_cell_anno.RData"
))
load
(
file
.
path
(
data_dir
,
"mtec_gene_anno.RData"
))
load
(
file
.
path
(
data_dir
,
"tras.RData"
))
```{
r
mtec_cell_anno
,
results
=
"asis"
}
mtec_cell_anno
```
#
Graphics
in
R
It
is
sensible
to
only
use
highly
variable
genes
(
HVG
)
in
the
analysis
of
single
cell
data
.
The
original
publication
identified
more
than
9
,
000
as
highly
variable
using
a
method
published
in
Brennecke
et
.
al
.
2013
and
we
only
retain
those
genes
for
further
analysis
.
#
ggplot2
{#
gg
}
We
also
import
a
table
with
genes
annotated
as
tissue
restricted
antigens
(`
tras
`)
from
Shinto
et
.
al
.
2013
and
annotation
for
mouse
genes
from
ENSEMBL
(`
mtec_gene_anno
`).
```{
r
trasENSEMBL
,
results
=
"asis"
}
tras
mtec_gene_anno
```
We
will
use
this
data
to
explore
plotting
in
R
.
#
Graphics
in
R
--
ggplot2
{#
gg
}
to
assess
gene
detection
rates
`
r
CRANpkg
(
"ggplot2"
)`
is
a
package
by
Hadley
Wickham
that
implements
the
idea
of
*
grammar
of
graphics
*
--
a
concept
created
by
Leland
Wilkinson
in
his
book
of
the
same
name
.
Comprehensive
documentation
for
the
package
can
...
...
@@ -161,42 +179,40 @@ be found on <http://ggplot2.tidyverse.org/>. The online documentation includes
example
use
cases
for
each
of
the
graphic
types
that
are
introduced
in
this
lab
(
and
many
more
)
and
is
an
invaluable
resource
when
creating
figures
.
Single
cell
RNA
-
Seq
data
has
many
zero
due
to
a
limited
capture
efficiency
and
biases
in
the
reverse
transcription
.
It
is
thus
interesting
to
see
how
Single
cell
RNA
-
Seq
data
commonly
has
many
zero
counts
due
to
a
limited
capture
efficiency
and
biases
in
the
reverse
transcription
step
.
It
is
thus
interesting
to
see
how
the
total
number
of
genes
expressed
(
at
least
one
count
)
relates
to
the
number
of
TRA
encoding
genes
expressed
.
The
single
cell
data
also
contains
mTEC
cells
select
based
on
surface
markers
,
we
exclude
those
and
make
sure
to
inspect
only
protein
coding
genes
.
In
order
to
do
this
,
we
first
turn
the
count
table
:
```{
r
mtec_count_table
}
mtec_counts
```
In
order
to
do
this
,
we
first
turn
the
count
table
which
contains
the
counts
for
each
cell
in
separate
columns
into
a
tidy
data
frame
.
We
`
gather
`
all
columns
except
for
the
first
one
and
then
add
a
column
telling
us
whether
a
specific
genes
is
coding
for
a
TRA
and
whether
it
was
detected
or
not
.
use
the
`
tras
`
table
to
add
columns
telling
us
whether
a
specific
genes
is
coding
for
a
TRA
and
whether
it
was
detected
or
not
.
Then
,
we
join
the
annotation
to
the
tidy
table
Then
,
we
join
the
annotation
of
the
cells
`
mtec_cell_anno
`
to
the
tidy
table
:
```{
r
tidy_count
}
mtec_counts_tidy
<-
gather
(
mtec_counts
,
key
=
"cell_id"
,
value
=
"count"
,
-
ensembl_id
)
%>%
dplyr
::
mutate
(
is_tra
=
ensembl_id
%
in
%
tras
$
gene
.
ids
,
dplyr
::
mutate
(
is_tra
=
ensembl_id
%
in
%
tras
$
gene
.
ids
,
is_detected
=
count
>
0
)
%>%
left_join
(
mtec_cell_anno
,
by
=
c
(
"cell_id"
=
"cellID"
))
```
Now
we
can
compute
the
number
of
TRA
genes
detected
per
cell
and
filter
After
joining
we
can
filter
the
non
--
protein
coding
genes
as
well
as
the
cells
that
have
been
processed
using
FACS
based
on
a
surface
marker
.
using
FACS
based
on
a
surface
marker
.
The
column
`
is_tra
`
contains
a
logical
vector
,
we
replace
`
TRUE
`
and
`
FALSE
`
by
actual
category
names
.
Finally
,
we
obtain
a
tidy
table
with
two
columns
per
cell
:
the
number
of
tras
and
non
-
tras
expressed
.
There
is
a
subtle
change
in
observations
:
we
go
from
genes
within
a
single
cell
as
our
unit
to
single
cells
.
We
then
compute
the
number
of
TRA
/
non
--
TRA
genes
detected
per
cell
using
the
function
`
tally
`
that
performs
groupwise
counting
on
grouped
data
frames
.
Following
this
,
we
spread
the
`
is_tra
`
into
to
columns
and
a
column
with
the
total
count
of
the
detected
genes
.
```{
r
tra_per_cell
}
...
...
@@ -212,7 +228,11 @@ tra_detected
```
We
visualize
this
relationship
in
a
scatterplot
with
ggplot2
.
Finally
,
after
spreading
the
`
is_tra
`
we
obtain
a
tidy
table
with
two
columns
per
cell
:
the
number
of
tras
and
non
-
tras
expressed
genes
.
We
add
a
column
holding
the
total
number
of
detected
genes
by
computing
their
sum
.
We
now
visualize
this
relationship
in
a
scatterplot
with
ggplot2
.
```{
r
travsall
,
fig
.
cap
=
"Total number of genes vs TRA"
}
scatter_tra
<-
ggplot
(
tra_detected
,
aes
(
x
=
total_detected
,
y
=
tra
))+
...
...
@@ -225,7 +245,7 @@ scatter_tra
We
just
wrote
our
first
"sentence"
using
the
grammar
of
graphics
.
Let
us
deconstruct
this
sentence
.
First
,
we
specified
the
data
frame
that
contains
the
data
,
`
mtec_counts_tidy
`.
First
,
we
specified
the
data
frame
that
contains
the
data
,
`
tra_detected
`.
Then
we
told
`
ggplot
`
via
the
`
aes
`.
This
stands
for
`
aesthetic
`
argument
(
Aesthetics
are
visual
property
of
the
objects
in
your
plot
.)
and
tells
`
r
CRANpkg
(
"ggplot2"
)`
which
variables
...
...
@@ -252,7 +272,7 @@ We set (instead of mapped) the alpha value to 0.2, this increases the
transparency
of
the
rugs
and
alleviates
overplotting
.
##
Exercise
:
Geometries
and
aspect
ratios
##
#
Exercise
:
Geometries
and
aspect
ratios
1.
What
happens
if
you
omit
the
calls
to
`
geom_point
()`
and
`
coord_equal
()`
?
...
...
@@ -262,7 +282,7 @@ out how to add a smoothing curve to the plot. Can you change the smoother
to
a
regression
line
?
```{
r
ex_geom
}
```{
r
ex_geom
,
echo
=
FALSE
}
ggplot
(
tra_detected
,
aes
(
x
=
total_detected
,
y
=
tra
))
scatter_tra
+
...
...
@@ -311,7 +331,8 @@ y_i = a + b x_{i} + \varepsilon_i; \quad
We
can
of
course
always
add
more
predictors
to
the
linear
function
.
The
coefficient
\(
b
\)
is
called
the
__slope__
and
\(
a
\)
is
called
the
__intercept__
.
We
can
fit
a
linear
regression
via
a
call
to
the
function
`
lm
()`.
The
regression
We
can
fit
a
linear
regression
via
a
call
to
the
function
`
lm
()`,
which
stands
for
"linear model"
.
The
regression
model
is
specified
using
R
's formula notation.
```{r regresssion_tra}
...
...
@@ -322,7 +343,7 @@ lm_tra
As we can see, the estimated
slope is ~ `r tidy(lm_tra, quick = TRUE)[2, "estimate"]`, indicating
that we have a proportion of `r tidy(lm_tra, quick = TRUE)[2, "estimate"]`
tra expressing genes on average per cell
for the highly variable genes
.
tra expressing genes on average per cell.
This is in line with the supplementary figure 1 of the original publication,
which uses the full set of expressed genes, not just the highly variable
ones.
...
...
@@ -330,7 +351,6 @@ ones.
# Local regression (LOESS)
Local regression is a commonly used approach for fitting flexible non--linear
functions, which involves computing many local linear regression fits and combining
them. Local regression is a very useful technique both for data visualization and
...
...
@@ -339,27 +359,17 @@ but it usually feasible with today's hardware. We illustrate the local regressio
using
the
`
r
CRANpkg
(
"locfit"
)
`
package
on
simulated
data
.
We
first
fit
a
linear
regression
line
to
simulated
data
that
follows
a
polynomial
trend
and
see
that
it
does
not
really
data
that
follows
a
polynomial
trend
\(
y
\)
and
see
that
it
does
not
really
fit
well
.
```{
r
loessExampleLinFit
,
dependson
=
"fit_model"
}
sim_data
<-
tibble
(
x
=
seq
(
from
=
1
,
to
=
10
,
length
.
out
=
100
),
y
=
x
^
3
+
x
^
2
+
rnorm
(
100
,
mean
=
0
,
sd
=
60
))
#
create_data
y
<-
seq
(
from
=
1
,
to
=
10
,
length
.
out
=
100
)
a
<-
y
^
3
+
y
^
2
+
rnorm
(
100
,
mean
=
0
,
sd
=
30
)
dataL
<-
data
.
frame
(
a
=
a
,
y
=
y
)
qplot
(
y
,
a
,
data
=
dataL
)
#
linear
fit
linreg
<-
lm
(
a
~
y
,
data
=
dataL
)
(
qplot
(
y
,
a
,
data
=
dataL
)
+
geom_abline
(
slope
=
tidy
(
linreg
,
quick
=
TRUE
)[
2
,
2
],
intercept
=
tidy
(
linreg
,
quick
=
TRUE
)[
1
,
2
]))
tidy
(
linreg
)
ggplot
(
aes
(
x
,
y
),
data
=
sim_data
)
+
geom_point
()
+
geom_smooth
(
method
=
"lm"
)
dataL
$
LinReg
<-
predict
(
linreg
)
```
We
now
use
the
function
`
locfit
`
to
perform
a
local
regression
on
the
...
...
@@ -371,29 +381,75 @@ The lower this percentage, the more closely the line will follow the data points
```{
r
loessExampleFit
,
dependson
=
"loessExampleLinFit"
}
dataL
$
locFit
<-
predict
(
locfit
(
y
~
lp
(
a
,
nn
=
0.5
,
deg
=
1
),
data
=
dataL
),
newdata
=
dataL
$
a
)
sim_data
$
locFit
<-
predict
(
locfit
(
y
~
lp
(
x
,
nn
=
0.5
,
deg
=
1
),
data
=
sim_data
),
newdata
=
sim_data
$
x
)
ggplot
(
aes
(
x
,
y
),
data
=
sim_data
)
+
geom_point
()
+
geom_smooth
(
method
=
"lm"
)
+
ggtitle
(
"Linear vs. local regression"
)
+
geom_line
(
aes
(
x
=
x
,
y
=
locFit
),
color
=
"coral3"
)
```
###
Exercise
:
Local
fit
experiments
1.
Check
the
help
for
`
geom_smooth
`
to
find
out
what
`
r
CRANpkg
(
"ggplot2"
)`
uses
as
default
smoothing
methods
.
2.
Play
with
the
`
nn
`
parameter
of
the
locfit
in
the
plot
,
how
does
affect
the
fit
?
```{
r
loessExercise
,
dependson
=
"loessExampleLinFit"
,
echo
=
FALSE
}
(
qplot
(
a
,
y
,
data
=
dataL
,
main
=
"Linear vs. local regression"
)
+
geom_line
(
aes
(
x
=
a
,
y
=
locFit
),
color
=
"dodgerblue3"
)
+
geom_line
(
aes
(
x
=
a
,
y
=
LinReg
),
color
=
"coral3"
))
sim_data
$
locFit
<-
predict
(
locfit
(
y
~
lp
(
x
,
nn
=
0.1
,
deg
=
1
),
data
=
sim_data
),
newdata
=
sim_data
$
x
)
ggplot
(
aes
(
x
,
y
),
data
=
sim_data
)
+
geom_point
()
+
geom_smooth
()
+
ggtitle
(
"Linear vs. local regression"
)
+
geom_line
(
aes
(
x
=
x
,
y
=
locFit
),
color
=
"coral3"
)
```
#
Normalization
of
bulk
and
single
cell
RNA
--
Seq
data
Before
we
can
discuss
further
analysis
of
our
data
set
,
we
need
to
"normalize"
or
"calibrate"
the
data
:
Transforming
it
so
that
the
data
from
multiple
samples
/
single
cells
is
comparable
to
each
other
.
You
can
roughly
think
of
this
process
as
transforming
the
raw
count
data
such
that
common
units
are
used
.
To
illustrate
this
,
let
's look
at "cell77" and "cell6S35" by creating a boxplot of their log counts
of detected genes:
```{r compCells}
ggplot(aes(x = cell_id, y = log(count)),
data = filter(mtec_counts_tidy,
cell_id %in% c("cell77", "cell6S35"),
is_detected == TRUE)) +
geom_boxplot()
#
Normalization
of
bulk
and
single
cell
data
```
Notice how `r CRANpkg("ggplot2")` automatically groups the data
by cell, computes the neccessary values and
then displays the boxplot
We can see that their medians are quite different: "cell77" has much
higher counts on average for the detected genes than "cell6S35" so that the counts
for the two cells are not comparable without proper rescaling of their
counts. The neccessary rescaling factors are called "size factors" and we
discuss them next.
## Size factors for RNA--Seq data
Before
exploring
normalization
of
single
cell
data
,
we
look
at
this
issue
in
bulk
RNA
--
Seq
:
Next
generation
sequencing
data
is
often
analyzed
in
the
form
of
read
counts
assigned
to
genomic
bins
.
Such
bins
typically
correspond
to
all
the
exons
of
a
gene
,
or
an
isoform
for
RNA
--
Seq
data
.
Before exploring size factors for single cell data, we look at this issue
in a bulk RNA--Seq data set.
T
he
aim
of
normalization
is
to
remove
between
--
sample
differences
As we know, t
he aim of normalization is to remove between--sample differences
related to library sizes and other systematic technical effects.
A popular method for the normalization in sequencing data is global scaling:
all counts for a given sample are scaled by factor such that the scaled
...
...
@@ -405,7 +461,9 @@ __normalized sample counts__ = __raw sample counts__ / __ sample size factor__
## Normalization of a Colectoral Cancer example data set
We
will
illustrate
this
using
a
dataset
from
the
[
recount2
resource
](
http
://
dx
.
doi
.
org
/
10.1038
/
nbt
.3838
).
[
L
ö
hr
et
.
al
.,
2013
](
https
://
doi
.
org
/
10.1371
/
journal
.
pone
.0067461
)
have
We will illustrate this using a dataset from the [recount2 resource](http://dx.doi.org/10.1038/nbt.3838).
[Löhr et. al., 2013](https://doi.org/10.1371/journal.pone.0067461) have
applied RNA-Seq on colorectal normal,
tumor and metastasis tissues from 8 patients to generate gene
expression patterns. They were particularly interested
...
...
graphics_bioinf.html
View file @
4bd3e6ac
This diff is collapsed.
Click to expand it.
graphics_bioinf.pdf
0 → 100644
View file @
4bd3e6ac
File added
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment