###### Jean-Karim's slides/content from a previous course from a couple of years ago
- what do you do with data over the lifetime of an experiment and afterwards
- from raw data to paper, and beyond
- preserve data integrity and effective comms
- some sort of tracking mechanism of data through intemediate steps to final product of an analysis
- keep things generic, not just for people who code/work on command line
- funding bodies are starting to ask for data management plan in proposals/applications
- need to make sure that anything in a paper lab book is also available in a computer-readable format
- documentation in general
- README for every project
- good practices in data processing
- examples
- filenames
- consistent, descriptive, no spaces
- judicious use of directory hierarchy
- discuss workflow with collaborators beforehand, to make sure that all can work with file formats that you plan to use
- tabular data, spreadsheets
- data storage
- backups backups backups
- plan adequate storage in advance
- noone ever modifies primary data
- check file integrity regularly
- relational databases for data management and workflow tracking/documentation
- links, info on large datafiles in database, rather than data itself, to prevent unmanageable expansion of database size
- option of a full LIMS
- can be developed in-house
- has the advantage of being "aware" of specifics of the project
- use of browser-based database management systems
- version control/tracking
- for result tracking (changes between different analysis parameters etc)
- data management checklist
###### Charles
- data duplication is a big problem here, with people not keeping track of which version of a file they used
- Another big problem is people not keeping track of analysis steps (software versions, parameters etc) (This is maybe a separate issue of experiment documentation?)
- Galaxy
- for command line, what does it mean to have a pipeline?