Skip to content
Snippets Groups Projects
Commit ef04b6e4 authored by Christian Arnold's avatar Christian Arnold
Browse files

Added README, moved config.json.README to the doc folder

parent 6f30a5e4
No related branches found
No related tags found
No related merge requests found
README 0 → 100644
This repository defines an ATAC-Seq pipeline using Snakemake.
Folders and files should be self-explanatory, however, a few notes:
1) The folder data is supposed to contain the input files for real analyses, organized in subfolders.
2) The doc folder is still a bit messy and contains various documents that can be ignored. it also contains the original ATAC-Seq paper. This should be cleaned eventually.
3) The example folder is important. Use this to test if you run the pipeline using a small example analysis consisting of two samples. To run it, adjust the parameters in the config.json file accordingly. For an explanation of the parameters, see the config.json.README in the src/Snakemake folder. It should produce output identical to what you see in the outpout directory. To run it, take a look into the src/Snakemake/runSnakefile.sh file.This is the current version of a wrapper script that runs Snakemake with user-defined settings. Edit it accordingly. There are a few issues at the moment that I am trying to further optimize:
a) The number of cores in total and per rule has to be defined twice at the moment (runSnakefile.sh: nCores, in the config.json).
b) Because either I am misunderstanding how Snakemake works or because this does not work for some reason, the config file also has to be specified twice at the moment (once in the runSnakefile.sh: configFile line, and once in the Snakefile itself: configfile). make sure to add the same configfile path for both
c) The cluster file can probably be improved by adding custom names for each rule etc. If you have further ideas, let me know.
d) The output dir also has to be specified twice: One in the runSnakefile.sh, line outputDir, and once in the config.json. The reason for this is that the workflow graph etc is also automatically put into the Logs_and_Benchmarks directory by the wrapper script, and the script has no direct access to the config.json file to retrieve the output directory. I will improve this when I have a free minute.
4) The output folder is supposed to contain all real analyses output (except for the example, which is kind of special to allow a quick test and look). The structure within the folder should correspond to the data directory.
5) The src folder contains the source of the project. The R subfolder can be ignored, as we switched completely to Snakemake.
File moved
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment