GRaNIE issueshttps://git.embl.de/grp-zaugg/GRaNIE/-/issues2023-06-14T09:48:40Zhttps://git.embl.de/grp-zaugg/GRaNIE/-/issues/92addTFBS() breaks for mm10 on HOCOMOCO2023-06-14T09:48:40ZSimone ProcacciaaddTFBS() breaks for mm10 on HOCOMOCOWhen I run:
```
> GRN = addTFBS(GRN, motifFolder = motifFolder, TFs = "all", filesTFBSPattern = "_TFBS",
+ fileEnding = ".bed.gz", forceRerun = TRUE)
```
Following the standard tutorial but with mm10 data and the HOCOMOCO m...When I run:
```
> GRN = addTFBS(GRN, motifFolder = motifFolder, TFs = "all", filesTFBSPattern = "_TFBS",
+ fileEnding = ".bed.gz", forceRerun = TRUE)
```
Following the standard tutorial but with mm10 data and the HOCOMOCO mm10 database provided by you I get the following error:
```
INFO [2023-05-15 11:30:49] Checking database folder for matching files: /PWMScan_HOCOMOCOv10
INFO [2023-05-15 11:30:49] Found 422 matching TFs: AHR, AIRE, ALX1, ANDR, AP2A, AP2B, AP2C, AP2D, ARI3A.D, ARI3A.S, ARI5B, ARNT, ARNT2, ATF1, ATF2, ATF3, ATOH1, BACH1, BARX2, BATF, BCL6, BHE40, BMAL1, BRAC, BRCA1, CDC5L, CDX1, CDX2, CEBPA, CEBPB, CEBPD, CEBPE, CEBPG, CEBPZ, COE1, COT1.B, COT1.S, COT2.B, COT2.S, CREB1, CREM, CRX.A, CRX.S, CTCF, CUX1, CXXC1, DBP, DDIT3, DLX2, DLX3, E2F1, E2F2, E2F3, E2F4, E2F5, E2F6, E2F7, E4F1, EGR1, EGR2, EGR3, EGR4, EHF, ELF1, ELF2, ELF3, ELF5, ELK1, ELK3, ELK4, ENOA, EPAS1, ERG, ERR1, ERR2, ERR3, ESR1, ESR2.B, ESR2.S, ETS1, ETS2, ETV4, ETV5, EVI1, EVX1, EVX2, FEV, FLI1, FOS, FOSB, FOSL1, FOSL2, FOXA1, FOXA2, FOXA3, FOXC1, FOXC2, FOXD1, FOXD3, FOXF1, FOXF2, FOXI1, FOXJ2, FOXJ3.A, FOXJ3.S, FOXM1, FOXO1, FOXO3, FOXO4, FOXP2, FOXP3, FOXQ1, FUBP1, GABP1, GABPA, GATA1, GATA2, GATA3, GATA4, GATA5, GATA6, GCM1, GCR.C, GCR.S, GFI1, GFI1B, GLI1, GLI2, GLI3, GLIS3, HAND1, HBP1, HEN1, HES1, HESX1, HEY2, HIC1, HIF1A, HINFP, HLF, HLTF, HMGA1, HMGA2, HNF1A, HNF1B, HNF4A, HNF4G, HNF6, HSF1, HSF2, HTF4, HXA1, HXA10, HXA13, HXA5, HXA7, HXA9, HXB1, HXB6, HXB7, HXB8, HXC6, HXC8, HXD10, HXD13, HXD4, HXD9, IKZF1, INSM1, IRF1, IRF2, IRF3, IRF4, IRF5, IRF7, IRF8, IRF9, ISL1, ITF2, JUN, JUNB, JUND, KAISO, KLF1, KLF15, KLF3, KLF4, KLF6, KLF8, LEF1, LHX2, LHX3, MAF, MAFA, MAFB, MAFG, MAFK.A, MAFK.S, MAX, MAZ, MBD2, MCR, MECP2, MEF2A, MEF2C, MEF2D, MEIS1, MEIS2, MITF, MLXPL, MSX2, MSX3, MTF1, MXI1, MYB, MYBB, MYC, MYCN, MYF6, MYOD1, MYOG, NANOG.A, NANOG.S, NDF1, NF2L1, NF2L2, NFAC1.A, NFAC1.S, NFAC2, NFAC3, NFAC4, NFAT5, NFE2, NFIA.C, NFIA.S, NFIL3, NFKB1, NFKB2, NFYA.D, NFYA.S, NFYB, NFYC, NKX21, NKX22, NKX25, NKX28, NKX31, NKX32, NOBOX, NR0B1, NR1D1, NR1H2, NR1H4, NR1I2.S, NR1I3.S, NR2C1, NR2C2, NR2E3, NR2F6, NR4A1, NR4A2, NR4A3, NR5A2, NR6A1, NRF1, OLIG2, ONEC2, OTX1, OTX2, OVOL1, P53, P63, P73, PAX2.D, PAX2.S, PAX5.D, PAX5.S, PAX6, PAX8, PBX1, PBX2, PBX3, PDX1, PEBB, PIT1, PITX2, PKNX1, PLAG1.D, PLAG1.S, PO2F1, PO2F2, PO3F1, PO3F2, PO4F2, PO5F1, PO6F1, PPARA.C, PPARA.S, PPARD, PPARG.A, PPARG.S, PRD14, PRDM1, PRGR.C, PRGR.S, PROP1, PRRX1, PRRX2, PTF1A, PURA, RARA.C, RARA.S, RARB, RARG.C, RARG.S, REL, RELB, REST, RFX1, RFX2, RFX3, RORA, RORG, RREB1, RUNX1, RUNX2, RUNX3, RXRA, RXRB, RXRG, SMAD1, SMAD2, SMAD3, SMAD4, SMRC1, SNAI1, SNAI2, SOX10, SOX13, SOX15, SOX17, SOX18, SOX2, SOX3, SOX4, SOX5, SOX9, SP1.A, SP1.S, SP2, SP3, SP4, SPI1, SPIB, SPZ1, SRBP1, SRBP2, SRF, SRY, STA5A, STA5B, STAT1, STAT2, STAT3, STAT4, STAT6, STF1, SUH, TAL1.A, TAL1.S, TBP, TBX2, TBX20, TBX3, TBX5, TCF7, TEAD1, TEAD3, TEAD4, TEF, TF2L1, TF65.C, TF65.S, TF7L2, TFCP2, TFDP1, TFE2.A, TFE2.S, TFE3, TFEB, TGIF1.S, THA.C, THA.S, THB.C, THB.S, TLX1.D, TLX1.S, TWST1, TYY1, UBIP1, USF1, USF2, VDR.C, VDR.S, VSX2, WT1, XBP1, YBOX1, ZBT18, ZBT7A, ZBTB6, ZEB1, ZEP1, ZEP2, ZFHX3, ZFX, ZIC1, ZIC2, ZIC3, ZN143, ZN148, ZN423
INFO [2023-05-15 11:30:49] Use all TF from the database folder /PWMScan_HOCOMOCOv10
INFO [2023-05-15 11:30:49] Reading file /PWMScan_HOCOMOCOv10/translationTable.csv
INFO [2023-05-15 11:30:49] Finished successfully. Execution time: 0 secs
INFO [2023-05-15 11:30:49] Filtering the following 422 TFs as they are not present in the RNA-Seq data: AHR,AIRE,ALX1,ANDR,AP2A,AP2B,AP2C,AP2D,ARI3A.D,ARI3A.S,ARI5B,ARNT,ARNT2,ATF1,ATF2,ATF3,ATOH1,BACH1,BARX2,BATF,BCL6,BHE40,BMAL1,BRAC,BRCA1,CDC5L,CDX1,CDX2,CEBPA,CEBPB,CEBPD,CEBPE,CEBPG,CEBPZ,COE1,COT1.B,COT1.S,COT2.B,COT2.S,CREB1,CREM,CRX.A,CRX.S,CTCF,CUX1,CXXC1,DBP,DDIT3,DLX2,DLX3,E2F1,E2F2,E2F3,E2F4,E2F5,E2F6,E2F7,E4F1,EGR1,EGR2,EGR3,EGR4,EHF,ELF1,ELF2,ELF3,ELF5,ELK1,ELK3,ELK4,ENOA,EPAS1,ERG,ERR1,ERR2,ERR3,ESR1,ESR2.B,ESR2.S,ETS1,ETS2,ETV4,ETV5,EVI1,EVX1,EVX2,FEV,FLI1,FOS,FOSB,FOSL1,FOSL2,FOXA1,FOXA2,FOXA3,FOXC1,FOXC2,FOXD1,FOXD3,FOXF1,FOXF2,FOXI1,FOXJ2,FOXJ3.A,FOXJ3.S,FOXM1,FOXO1,FOXO3,FOXO4,FOXP2,FOXP3,FOXQ1,FUBP1,GABP1,GABPA,GATA1,GATA2,GATA3,GATA4,GATA5,GATA6,GCM1,GCR.C,GCR.S,GFI1,GFI1B,GLI1,GLI2,GLI3,GLIS3,HAND1,HBP1,HEN1,HES1,HESX1,HEY2,HIC1,HIF1A,HINFP,HLF,HLTF,HMGA1,HMGA2,HNF1A,HNF1B,HNF4A,HNF4G,HNF6,HSF1,HSF2,HTF4,HXA1,HXA10,HXA13,HXA5,HXA7,HXA9,HXB1,HXB6,HXB7,HXB8,HXC6,HXC8,HXD10,HXD13,HXD4,HXD9,IKZF1,INSM1,IRF1,IRF2,IRF3,IRF4,IRF5,IRF7,IRF8,IRF9,ISL1,ITF2,JUN,JUNB,JUND,KAISO,KLF1,KLF15,KLF3,KLF4,KLF6,KLF8,LEF1,LHX2,LHX3,MAF,MAFA,MAFB,MAFG,MAFK.A,MAFK.S,MAX,MAZ,MBD2,MCR,MECP2,MEF2A,MEF2C,MEF2D,MEIS1,MEIS2,MITF,MLXPL,MSX2,MSX3,MTF1,MXI1,MYB,MYBB,MYC,MYCN,MYF6,MYOD1,MYOG,NANOG.A,NANOG.S,NDF1,NF2L1,NF2L2,NFAC1.A,NFAC1.S,NFAC2,NFAC3,NFAC4,NFAT5,NFE2,NFIA.C,NFIA.S,NFIL3,NFKB1,NFKB2,NFYA.D,NFYA.S,NFYB,NFYC,NKX21,NKX22,NKX25,NKX28,NKX31,NKX32,NOBOX,NR0B1,NR1D1,NR1H2,NR1H4,NR1I2.S,NR1I3.S,NR2C1,NR2C2,NR2E3,NR2F6,NR4A1,NR4A2,NR4A3,NR5A2,NR6A1,NRF1,OLIG2,ONEC2,OTX1,OTX2,OVOL1,P53,P63,P73,PAX2.D,PAX2.S,PAX5.D,PAX5.S,PAX6,PAX8,PBX1,PBX2,PBX3,PDX1,PEBB,PIT1,PITX2,PKNX1,PLAG1.D,PLAG1.S,PO2F1,PO2F2,PO3F1,PO3F2,PO4F2,PO5F1,PO6F1,PPARA.C,PPARA.S,PPARD,PPARG.A,PPARG.S,PRD14,PRDM1,PRGR.C,PRGR.S,PROP1,PRRX1,PRRX2,PTF1A,PURA,RARA.C,RARA.S,RARB,RARG.C,RARG.S,REL,RELB,REST,RFX1,RFX2,RFX3,RORA,RORG,RREB1,RUNX1,RUNX2,RUNX3,RXRA,RXRB,RXRG,SMAD1,SMAD2,SMAD3,SMAD4,SMRC1,SNAI1,SNAI2,SOX10,SOX13,SOX15,SOX17,SOX18,SOX2,SOX3,SOX4,SOX5,SOX9,SP1.A,SP1.S,SP2,SP3,SP4,SPI1,SPIB,SPZ1,SRBP1,SRBP2,SRF,SRY,STA5A,STA5B,STAT1,STAT2,STAT3,STAT4,STAT6,STF1,SUH,TAL1.A,TAL1.S,TBP,TBX2,TBX20,TBX3,TBX5,TCF7,TEAD1,TEAD3,TEAD4,TEF,TF2L1,TF65.C,TF65.S,TF7L2,TFCP2,TFDP1,TFE2.A,TFE2.S,TFE3,TFEB,TGIF1.S,THA.C,THA.S,THB.C,THB.S,TLX1.D,TLX1.S,TWST1,TYY1,UBIP1,USF1,USF2,VDR.C,VDR.S,VSX2,WT1,XBP1,YBOX1,ZBT18,ZBT7A,ZBTB6,ZEB1,ZEP1,ZEP2,ZFHX3,ZFX,ZIC1,ZIC2,ZIC3,ZN143,ZN148,ZN423
ERROR [2023-05-15 11:30:49] No shared Tfs.
########################################################################################
# An error occurred. See details above. If you think this is a bug, please contact us. #
########################################################################################
Error in .checkAndLogWarningsAndErrors(NULL, message, isWarning = FALSE) :
No shared Tfs.
########################################################################################
# An error occurred. See details above. If you think this is a bug, please contact us. #
########################################################################################
```
Any clue why this is happening?
Thank you :smile:https://git.embl.de/grp-zaugg/GRaNIE/-/issues/82Deprecated `pages` parameter in the Vignette.2022-06-29T13:20:36ZMaksim KholmatovDeprecated `pages` parameter in the Vignette.In the [workflow example](https://grp-zaugg.embl-community.io/GRaNIE/articles/GRaNIE_workflow.html#network-and-enrichment-analyses-for-filtered-connections) on the website Network and enrichment analyses functions still use the `pages` p...In the [workflow example](https://grp-zaugg.embl-community.io/GRaNIE/articles/GRaNIE_workflow.html#network-and-enrichment-analyses-for-filtered-connections) on the website Network and enrichment analyses functions still use the `pages` parameter even though it's not present.https://git.embl.de/grp-zaugg/GRaNIE/-/issues/84duplicate `row.names` in CommunitiesEnrichment2022-06-28T18:36:52ZMaksim Kholmatovduplicate `row.names` in CommunitiesEnrichmentDidn't manage to get through `performAllNetworkAnalyses`:
```r
> GRN = GRaNIE::performAllNetworkAnalyses(GRN, ontology = c("GO_BP"), forceRerun = TRUE)
Error in `.rowNamesDF<-`(x, value = value) :
duplicate 'row.names' are not allowed...Didn't manage to get through `performAllNetworkAnalyses`:
```r
> GRN = GRaNIE::performAllNetworkAnalyses(GRN, ontology = c("GO_BP"), forceRerun = TRUE)
Error in `.rowNamesDF<-`(x, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique value when setting 'row.names': ‘positive regulation of extrinsic apoptotic sign...’
```
Judging by the files that it manages to create I think the problem came up when doing `plotCommunitiesEnrichment`.Christian ArnoldChristian Arnoldhttps://git.embl.de/grp-zaugg/GRaNIE/-/issues/83Unclear error message from `visualizeGRN`2022-06-28T09:46:58ZMaksim KholmatovUnclear error message from `visualizeGRN````r
> GRN = GRaNIE::visualizeGRN(GRN)
Error in UseMethod("rename") :
no applicable method for 'rename' applied to an object of class "NULL"
```
R4.2 and latest bioconductor version of GRaNIE == 1.0.2.```r
> GRN = GRaNIE::visualizeGRN(GRN)
Error in UseMethod("rename") :
no applicable method for 'rename' applied to an object of class "NULL"
```
R4.2 and latest bioconductor version of GRaNIE == 1.0.2.Rim MoussaRim Moussahttps://git.embl.de/grp-zaugg/GRaNIE/-/issues/81No `graph` slot in GRaNIE object.2022-06-28T09:43:53ZMaksim KholmatovNo `graph` slot in GRaNIE object.When loading an older object from a couple of months ago I get this non-critical error when just printing the object:
```r
> x
Object of class: GRaNIE ( version 0.99.0 )
Data summary:
Number of peaks (filtered, all): 22188, 31449
Numb...When loading an older object from a couple of months ago I get this non-critical error when just printing the object:
```r
> x
Object of class: GRaNIE ( version 0.99.0 )
Data summary:
Number of peaks (filtered, all): 22188, 31449
Number of genes (filtered, all): 14463, 60620
Parameters:
Provided metadata:
name : Monocytes_R848
file_peaks : /g/scb/zaugg/kholmato/PROJECTS/Myeloid_ageing/ChIP/6.PeakCalling/stringent/consensusPeaks.minOverlap10.allBams.overlaps.bed.gz
file_rna : /g/scb/zaugg/kholmato/PROJECTS/Myeloid_ageing/RNA/6.GeneCounts/sampleSummary_singleEnd/geneCounts_beforeDuplicatesRemoval_clean_raw.tsv
file_sampleMetadata : /g/scb/zaugg/kholmato/PROJECTS/Myeloid_ageing/GRN/input/Sample_Metadata_Merged.tsv
genomeAssembly : hg38
Connections:
TF-peak links (1355 with FDR < 0.3)
peak-gene links (150103 with 250000 bp promoter range)
TF-peak-gene links (379 with TF-peak FDR 0.2 and peak-gene FDR NA)
Network-related:
Error in (new("standardGeneric", .Data = function (object) :
no slot of name "graph" for this object of class "GRN"
```
Maybe this should be handled without throwing the e-word, but some warning about package versions?Improve R packagehttps://git.embl.de/grp-zaugg/GRaNIE/-/issues/80include_TF_gene_correlations = TRUE creates completely duplicated rows in res...2022-02-09T13:07:17ZMang Ching Laiinclude_TF_gene_correlations = TRUE creates completely duplicated rows in resultsGRN_connections.all = GRN::getGRNConnections(GRN, type = "all.filtered", include_TF_gene_correlations = TRUE)
include_TF_gene_correlations=TRUE creates completely duplicated rows, changing to FALSE solves the issue
GRN_0.10.1GRN_connections.all = GRN::getGRNConnections(GRN, type = "all.filtered", include_TF_gene_correlations = TRUE)
include_TF_gene_correlations=TRUE creates completely duplicated rows, changing to FALSE solves the issue
GRN_0.10.1Christian ArnoldChristian Arnoldhttps://git.embl.de/grp-zaugg/GRaNIE/-/issues/71GC Correction for TF-peak links and additional criteria for foreground and ba...2021-12-14T13:02:17ZChristian ArnoldGC Correction for TF-peak links and additional criteria for foreground and backgroundAs discussed, an additional feature will be implemented that constructs a GC-corrected background in addition to the GC-unaware way we are doing it now. The FDRs for the TF-peak links will then also have two versions.
Initial questions ...As discussed, an additional feature will be implemented that constructs a GC-corrected background in addition to the GC-unaware way we are doing it now. The FDRs for the TF-peak links will then also have two versions.
Initial questions for the specific implementation and edge cases we have to deal with:
1. Do we take the GC content for the peaks or the specific TFBS? Both should be possible, as we originally assign a 1 or 0 to a TF - peak pair given the predicted TFBS. If we use the GC content of the peak, this might not correlate well with the GC content of the actual TFBS, especially for larger peaks. If we use the GC content of the TFBS, a peak will usually have different GC contents, namely the one specific from the exact TFBS that overlaps with the peak. I did not think it through fully, but I guess the TFBS-specific one should be prefered? I am not sure what it means for the implementation or the logic of our FDR procedure, though, when a peak has different GC contents yet.
2. What to do if the GC-specific distribution of the foreground cannot be matched with the background? For example, 80% of the foreground is GC-rich, while the background contains not too many GC-rich regions? At which point we have to flag the background curve because it does NOT match the foreground well? Shall we use a new metric to capture how well the GC-matched background worked, from 0 to 1 in a way so we get an idea of how well this works overall? How this metric would be constructed?
3. Sizes of background and foreground: Shall we require minimum sizes for foreground and background so we have some confidence in the distributions that is the bases for our FDR procedure? Taking SP1 as example, is 3k background enough for us with 20k foreground, and what do we do if either foreground or background is to little? I guess setting to NA would be the best as we should not have (false) confidence when numbers are too low. This point is independent of the GC correction actually.
4. How to balance between "maximize the number of peaks in the background versus mimic the foreground GC-distribution as much as possible"? Wich quantitative rules we want to apply? Example: 10k foreground regions, 15 k background regions. 80% of foreground are GC rich (80% and above), while only 2k regions in the background have GC rich regions. We can then either (1) Use all 2k from the background, so that this becomes also 80% relatively speaking, and fill the remaining 20% with a matching that is as good as possible for the other GC bins. Still means we will not have more than 2.4k or so regions in the background in total due to the limitation that the GC-rich regions should be 80% of overall regions, so we throw away many background regions. Alternatively, we use more background regions, but then we cannot maintain the 80% threshold, how we balance this? We need specific rules. I am sure across datasets, we will for sure run into all of these edge cases, so I want to have a good idea of these while implementing already.
I will start with asap after the Moritz data analysis so might be only in the new year.Additional functions and methodsChristian ArnoldChristian Arnoldhttps://git.embl.de/grp-zaugg/GRaNIE/-/issues/25Peak-gene confounding factors2021-12-14T13:01:23ZChristian ArnoldPeak-gene confounding factorsSummarize here the current understanding we have about how and why peak-gene correlations may be biased also in the background.
Issue: often random networks have still high signal (as shown in enrichment of low p-values when picking ran...Summarize here the current understanding we have about how and why peak-gene correlations may be biased also in the background.
Issue: often random networks have still high signal (as shown in enrichment of low p-values when picking random peak-gene pairs to calculate the correlation).
Current decision: as long as the signal in p-values is higher for positive correlations than for negative ones in the real data, and also higher for positive correlations in real vs random, there is no need to be concerned. (e.g. for the AML data it looks good). However, if the signal is similarly high for negative and positive correlations we should be concerned and try to figure out why this is the case.
Ideas discussed for selecting threshold for peak-gene correlations:
* use the distribution of negative correlations (of the real network) as background for the positive correlations (e.g. by mirroring the distribution of negative correlations on zero).
* same but use the p-value distribution to calculate FDR fg=pvalues from positive correlations and bg=pvalues from negative correlations.Additional and independent QC plotsChristian ArnoldChristian Arnoldhttps://git.embl.de/grp-zaugg/GRaNIE/-/issues/66TF / Gene / peak importance/ Network statistics2021-12-14T13:01:08ZChristian ArnoldTF / Gene / peak importance/ Network statisticsAs mentioned by Aryan today in his update, network measures for TF and (potentially also peak and gene) importance might shed light on the underlying biology of the GRN. I feel this is not really part of Aryan's prediction pipeline but m...As mentioned by Aryan today in his update, network measures for TF and (potentially also peak and gene) importance might shed light on the underlying biology of the GRN. I feel this is not really part of Aryan's prediction pipeline but more part of the construction. If you agree, we can talk about implementing them at some point, and how.
I could imagine having one or multiple scores for each node in the network as part of the output, and a few QC plots for the overall distribution of the different measures.
Just here so we dont forget.Additional and independent QC plotsChristian ArnoldChristian Arnoldhttps://git.embl.de/grp-zaugg/GRaNIE/-/issues/78Rerun GRNs with higher peak-gene neighborhood size2021-12-14T13:00:56ZChristian ArnoldRerun GRNs with higher peak-gene neighborhood sizeThe goal is to re-run parts of the GRN pipeline for some (most) of the datasets and to change the parameter `promoterRange` to 2000000 -(2 million) + to get familiar with it more. These are the steps to do:
1. For the particular dataset...The goal is to re-run parts of the GRN pipeline for some (most) of the datasets and to change the parameter `promoterRange` to 2000000 -(2 million) + to get familiar with it more. These are the steps to do:
1. For the particular dataset (check /g/scb/zaugg/zaugg_shared/data/GRN/data and subdirectories within), locate the runGRN.R file in the input directory. If not present, let me know, I am currently cleaning this up.
2. Change `promoterRange` to 2,000,000 and rerun the whole script. Change the `dir_output` and add a suffix like "peakGene2000000" or something like this. Let's be conservative and save all in a new output folder altogether.
Let's try this for one dataset and then we can adjust if something is unexpected or can be improved. I'll update the procedure above then.
**Let's start with /g/scb/zaugg/zaugg_shared/data/GRN/data/Macrophages/input/runGRN.R. **. Note that for the Macrophages dataset, we iterate in the script over the 4 different subpopulations + all, so the GRN is run a total of 5 x in an identical fashion.Organize and QC datasetsRim MoussaRim Moussahttps://git.embl.de/grp-zaugg/GRaNIE/-/issues/76Vignettes and package website2021-12-14T13:00:49ZChristian ArnoldVignettes and package websiteAfter some checking today, I decided to go for the following solution for everything that related to GRN documentation, vignettes, and documentation more generally: it looks like this: https://pkgdown.r-lib.org/
I will use the pkgdown p...After some checking today, I decided to go for the following solution for everything that related to GRN documentation, vignettes, and documentation more generally: it looks like this: https://pkgdown.r-lib.org/
I will use the pkgdown package that essentially produces a website that is part of the Git repository and that includes all documentation-related information and can be automatically build from within R
To Dos:
- [ ] finish help text for vignette
- [ ] write Intro vignette
- [ ] finish class documentation
- [x] decide for style: https://gallery.shinyapps.io/117-shinythemes/
- [ ] check and setup Google Analytics tracker
- [x] customize and check optionsImprove R packageChristian ArnoldChristian Arnoldhttps://git.embl.de/grp-zaugg/GRaNIE/-/issues/40Reduce object size for GRN object2021-12-14T13:00:25ZChristian ArnoldReduce object size for GRN object- [x] Sparse matrices for binary matrices (AR classification slot)
- [x] RNA counts: Save only shuffled order for permutation variants
- [ ] Save rownames only once and not multiple times
- [x] Identify all intermediate files and add opt...- [x] Sparse matrices for binary matrices (AR classification slot)
- [x] RNA counts: Save only shuffled order for permutation variants
- [ ] Save rownames only once and not multiple times
- [x] Identify all intermediate files and add option to delete them
- [x] peak-TF binding matrix
- [x] Annotation field for gene annotation that is stored only once and not once per permutation
- [ ] countsNorm slot for ATAC
- [x] annotation consensus peaks vs ATAC$consensusPeaks
- [ ] Stats slot
- [x] TF-peak links: Discard high-FDR ones via a parameterImprove R packageChristian ArnoldChristian Arnoldhttps://git.embl.de/grp-zaugg/GRaNIE/-/issues/59Need for nested cross-validation?2021-12-14T12:59:59ZChristian ArnoldNeed for nested cross-validation?A general question for discussion here: Currently, Aryan performs a non-nested cross-validation. For the random forest, are there any hyperparameters that are set manually or that are chosen even data-dependently? If yes, overall perform...A general question for discussion here: Currently, Aryan performs a non-nested cross-validation. For the random forest, are there any hyperparameters that are set manually or that are chosen even data-dependently? If yes, overall performance may be biased, and a nested cross-validation scheme that includes the hyperparameters might be a good idea:
See https://scikit-learn.org/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html, for exampleClean up and organising prediction pipelineAryan KamalAryan Kamalhttps://git.embl.de/grp-zaugg/GRaNIE/-/issues/35Prediction pipeline codes clean up2021-12-14T12:59:10ZAryan KamalPrediction pipeline codes clean upcleanup the codes and functions.
- [ ] logger
- [ ] generalization
- [ ] efficiencycleanup the codes and functions.
- [ ] logger
- [ ] generalization
- [ ] efficiencyClean up and organising prediction pipelineAryan KamalAryan Kamalhttps://git.embl.de/grp-zaugg/GRaNIE/-/issues/38Add TF activity as optional input to GRN2021-12-14T12:57:56ZChristian ArnoldAdd TF activity as optional input to GRNIn addition to ATAC and RNA, TF activity can be added to the GRN framework.
Let's summarize here the details so we can start thinking about how to do it, who does it and other details.
@zaugg and @berest Could you quickly summarize what ...In addition to ATAC and RNA, TF activity can be added to the GRN framework.
Let's summarize here the details so we can start thinking about how to do it, who does it and other details.
@zaugg and @berest Could you quickly summarize what you had in mind?
In addition to TF expression (which is sample-specific, one value per sample), we then also have TF activity (which is specific for the pairwise comparison that we not integrate into the GRN framework at all currently, so only one value per TF). How would the link creation work then exactly?Additional functions and methodshttps://git.embl.de/grp-zaugg/GRaNIE/-/issues/77IHW for peak-gene p-value adjustment2021-05-20T19:50:39ZChristian ArnoldIHW for peak-gene p-value adjustment- [x] Implement IHW and new parameters for functions
- [x] Implement diagnostic plots
- [ ] check with datasets which covariates work and how much more power they give- [x] Implement IHW and new parameters for functions
- [x] Implement diagnostic plots
- [ ] check with datasets which covariates work and how much more power they giveAdditional functions and methodsChristian ArnoldChristian Arnoldhttps://git.embl.de/grp-zaugg/GRaNIE/-/issues/72Speed improvements2021-05-19T21:04:14ZChristian ArnoldSpeed improvementsRunning time is too long for large no. of peaks.
- TF-peak FDR
Will be extended and edited here in the near future.Running time is too long for large no. of peaks.
- TF-peak FDR
Will be extended and edited here in the near future.Improve R packageChristian ArnoldChristian Arnoldhttps://git.embl.de/grp-zaugg/GRaNIE/-/issues/39GRN visualization2021-05-19T21:01:28ZChristian ArnoldGRN visualizationUltimately, a versatile and dynamic visualization of the network would be nice. A Shiny app, for example. Ivan (@berest) and Holly work on this at the moment, and could share a prototype later for integration with this projects.Ultimately, a versatile and dynamic visualization of the network would be nice. A Shiny app, for example. Ivan (@berest) and Holly work on this at the moment, and could share a prototype later for integration with this projects.Additional functions and methodsAryan KamalAryan Kamalhttps://git.embl.de/grp-zaugg/GRaNIE/-/issues/16Interactive visualization2021-05-19T21:00:49ZChristian ArnoldInteractive visualizationIvan has a prototype for this, ask him
Shiny appIvan has a prototype for this, ask him
Shiny appChristian ArnoldChristian Arnoldhttps://git.embl.de/grp-zaugg/GRaNIE/-/issues/75PCA improvements2021-05-19T20:59:39ZChristian ArnoldPCA improvementsImprove the following:
- [x] Allow PCA also for cases when prenormalized data has been provided
- [x] make colors nicer, use viridis colors
- [x] Scree plots using the same data, remove PCATools dependency, improve legends, one scree plo...Improve the following:
- [x] Allow PCA also for cases when prenormalized data has been provided
- [x] make colors nicer, use viridis colors
- [x] Scree plots using the same data, remove PCATools dependency, improve legends, one scree plot per nVariablRows value
- [x] automatically convert Dates to proper colors, and logicals
- [x] double-check results look similar to before due to changesAdditional functions and methodsChristian ArnoldChristian Arnold