|
|
## Bioinformatics engineer exercise
|
|
|
|
|
|
Welcome to the exercise. You will complete a couple of small tasks.
|
|
|
|
|
|
Time: 45 min from start of the exercise.
|
|
|
|
|
|
Each task is encapsulated, if you cannot complete a task, don't worry and just move on to the next.
|
|
|
|
|
|
You may use internet. You must be able to explain the way how you obtained the results. Direct help from other people is not allow.
|
|
|
|
|
|
You may use any tools you like. You can create as many intermediate files as you need, please do not modify the original files.
|
|
|
|
|
|
There are no "hidden" bugs, the input file formats are consistent.
|
|
|
|
|
|
---
|
|
|
|
|
|
### Task A: Filter a table for gene IDs
|
|
|
|
|
|
In this exercise you will filter an input table "results.txt" for certain gene IDs listed in "ids.txt".
|
|
|
|
|
|
#### Input:
|
|
|
- tab-separated file "results.txt"
|
|
|
- ID file "ids.txt"
|
|
|
|
|
|
#### Preferred Output:
|
|
|
- "outputA.txt": filtered "results.txt" table containing only ids in ids.txt
|
|
|
|
|
|
Notes: Pretend this is a step you could plug into a pipeline (given results.txt and ids.txt always have the same format).
|
|
|
|
|
|
---
|
|
|
|
|
|
### Task B: Annotate a table
|
|
|
|
|
|
The table "notannotated.txt" needs to be annotated with "mapping.txt". Both files contain the column "ID" for mapping.
|
|
|
|
|
|
#### Input:
|
|
|
- tab-separated table "notannotated.txt" containing columns "ID, p-value, logFC"
|
|
|
- tab-separated table "mapping.txt" containing columns "ID, name"
|
|
|
|
|
|
#### Preferred Output:
|
|
|
- "outputB.txt" - annotated results table containing "ID, name, p-value, logFC" (in any order)
|
|
|
|
|
|
Notes: Pretend this is a step you could plug into a pipeline (given notannotated.txt and mapping.txt always have the same format).
|
|
|
|
|
|
---
|
|
|
|
|
|
### Task C: Prepare a result table for viewing to biologists
|
|
|
|
|
|
The results from our analysis have to be displayed to wet-lab biologist who have no experience in bioinformatics.
|
|
|
|
|
|
#### Input:
|
|
|
- tab-delimited "deseq.results.txt"
|
|
|
|
|
|
#### Output:
|
|
|
- Formatted Excel or Open Office table named "outputC"
|
|
|
|
|
|
Note: This is a final preparation of the table and does not have to be done in a "programming" fashion. Edit the table with Excel, Open Office, or any editor you want.
|
|
|
|
|
|
---
|
|
|
|
|
|
### Task D: Comparison of two raw data sample
|
|
|
|
|
|
Table "raw.data.txt" contains two samples "A" and "B" of unknown properties. We want to compare if the biological replicates "worked". Plot your results.
|
|
|
|
|
|
#### Input:
|
|
|
- tab-separated table "raw.data.txt"
|
|
|
|
|
|
#### Output:
|
|
|
- Any type of output including graphical.
|
|
|
|
|
|
---
|
|
|
|
|
|
Good luck. |