Skip to content
GitLab
Projects
Groups
Snippets
Help
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
S
stat_methods_bioinf
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Service Desk
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Operations
Operations
Incidents
Environments
Packages & Registries
Packages & Registries
Container Registry
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Bernd Klaus
stat_methods_bioinf
Commits
18746b2a
Commit
18746b2a
authored
Aug 16, 2017
by
Bernd Klaus
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
added boxplot for raw counts
parent
a3f4143f
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
40 additions
and
7 deletions
+40
-7
graphics_bioinf.Rmd
graphics_bioinf.Rmd
+40
-7
No files found.
graphics_bioinf.Rmd
View file @
18746b2a
...
...
@@ -389,7 +389,7 @@ dataL$locFit <- predict(locfit(y~lp(a, nn=0.5, deg=1), data=dataL),
#
Normalization
and
variance
stabilization
of
bulk
and
single
cell
data
##
Size
factors
for
bulk
RNA
--
Seq
##
Size
factors
for
RNA
--
Seq
data
Before
exploring
normalization
of
single
cell
data
,
we
look
at
this
issue
in
bulk
RNA
--
Seq
:
Next
generation
sequencing
data
is
often
analyzed
in
the
...
...
@@ -406,7 +406,7 @@ counts are comparable across samples. Such a factor is commonly called a
__normalized
sample
counts__
=
__raw
sample
counts__
/
__
sample
size
factor__
##
A
Colectoral
Cancer
exmaple
data
set
##
Normalization
of
a
Colectoral
Cancer
exmaple
data
set
We
will
illustrate
this
using
a
dataset
from
the
[
recount2
resource
](
http
://
dx
.
doi
.
org
/
10.1038
/
nbt
.3838
).
[
L
ö
hr
et
.
al
.,
2013
](
https
://
doi
.
org
/
10.1371
/
journal
.
pone
.0067461
)
have
applied
RNA
-
Seq
on
colorectal
normal
,
...
...
@@ -460,17 +460,15 @@ Therefore, we can preview the actual count data matrix like so:
```{
r
pre_crc
}
counts_crc
<-
assay
(
crc_data
)
counts_crc
[
1
:
5
,
1
:
5
]
counts_p4_meta
<-
counts_crc
[,
c
(
"SRR837829"
,
"SRR837847"
)]
View
(
counts_p4_meta
)
```
The
annotation
for
the
genes
looks
like
this
:
```{
r
pre_crc_genes
}
colnames
(
colData
(
crc_data
))
colData
(
crc_data
)[
1
:
5
,
c
(
"title"
,
"
characteristics"
,
"
mapped_read_count"
)]
colData
(
crc_data
)[
1
:
5
,
c
(
"title"
,
"mapped_read_count"
)]
nrow
(
colData
(
crc_data
))
```
...
...
@@ -479,7 +477,42 @@ We have various normal, tumor and metastasis tissues from 8 donors. The authors
had
one
pipelinr
for
the
quantification
of
miRNA
and
one
for
the
quantification
of
"ordinary"
genes
.
But
as
the
recount2
ressource
uses
a
unified
pipeline
with
annotation
data
from
the
[
genecode
project
](
https
://
www
.
gencodegenes
.
org
/
stats
/
archive
.
html
),
which
contains
non
coding
features
as
well
,
we
can
focus
on
the
samples
annotated
as
"mRNA"
.
as
"mRNA"
.
We
will
now
create
a
column
data
table
that
contains
the
appropriate
sample
annotation
.
Specifically
,
we
have
to
extract
the
sample
annotations
from
the
title
column
and
subset
on
the
samples
processed
using
mRNA
based
quantification
.
```{
r
create_crc_col_data
}
col_data_crc
<-
select
(
as
.
data
.
frame
(
colData
(
crc_data
)),
title
,
characteristics
,
mapped_read_count
)
%>%
rownames_to_column
(
var
=
"sample_id"
)
%>%
as_tibble
()
%>%
tidyr
::
extract
(
title
,
into
=
c
(
"quantification"
,
"patient"
,
"tissue"
),
regex
=
"([[:alnum:]]+)_([[:alnum:]]+)_([[:alnum:]]+)"
)
%>%
dplyr
::
filter
(
quantification
==
"mRNA"
)
```
We
now
plot
the
log
counts
of
the
mRNA
samples
to
see
whether
their
distributions
are
comparable
.
In
order
to
avoid
taking
the
log
of
zero
,
we
only
retain
a
gene
within
a
sample
that
has
at
least
one
count
```{
r
mrna_counts
}
counts_crc_tidy
<-
rownames_to_column
(
data
.
frame
(
counts_crc
),
var
=
"ensembl_id"
)
%>%
as_tibble
()
%>%
gather
(
key
=
"sample_id"
,
value
=
"count"
,
-
ensembl_id
)
%>%
dplyr
::
filter
(
sample_id
%
in
%
col_data_crc
$
sample_id
)
%>%
dplyr
::
filter
(
count
>
1
)
ggplot
(
counts_crc_tidy
,
aes
(
x
=
sample_id
,
y
=
log2
(
count
),
fill
=
sample_id
)
)
+
geom_boxplot
()
```
<
!-- the median gene expression per sample relative to a reference -->
<
!-- sample formed by the geometric mean of each single gene across samples. -->
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment