Added README, moved config.json.README to the doc folder

ef04b6e4 · Christian Arnold · 6f30a5e4 · ef04b6e4 · ef04b6e4
Commit ef04b6e4 authored 8 years ago by Christian Arnold
--- a/README
+++ b/README
+This repository defines an ATAC-Seq pipeline using Snakemake.
+
+Folders and files should be self-explanatory, however, a few notes:
+  1) The folder data is supposed to contain the input files for real analyses, organized in subfolders.
+  2) The doc folder is still a bit messy and contains various documents that can be ignored. it also contains the original ATAC-Seq paper. This should be cleaned eventually.
+  3) The example folder is important. Use this to test if you run the pipeline using a small example analysis consisting of two samples. To run it, adjust the parameters in the config.json file accordingly. For an explanation of the parameters, see the config.json.README in the src/Snakemake folder. It should produce output identical to what you see in the outpout directory. To run it, take a look into the src/Snakemake/runSnakefile.sh file.This is the current version of a wrapper script that runs Snakemake with user-defined settings. Edit it accordingly. There are a few issues at the moment that I am trying to further optimize:
+  a) The number of cores in total and per rule has to be defined twice at the moment (runSnakefile.sh: nCores, in the config.json).
+  b) Because either I am misunderstanding how Snakemake works or because this does not work for some reason, the config file also has to be specified twice at the moment (once in the runSnakefile.sh: configFile line, and once in the Snakefile itself: configfile). make sure to add the same configfile path for both
+  c) The cluster file can probably be improved by adding custom names for each rule etc. If you have further ideas, let me know.
+  d) The output dir also has to be specified twice: One in the runSnakefile.sh, line outputDir, and once in the config.json. The reason for this is that the workflow graph etc is also automatically put into the Logs_and_Benchmarks directory by the wrapper script, and the script has no direct access to the config.json file to retrieve the output directory. I will improve this when I have a free minute.
+  4) The output folder is supposed to contain all real analyses output (except for the example, which is kind of special to allow a quick test and look). The structure within the folder should correspond to the data directory.
+  5) The src folder contains the source of the project. The R subfolder can be ignored, as we switched completely to Snakemake.
--- a/src/Snakemake/config.json.README
+++ b/src/Snakemake/config.json.README