Overview
Read Clipping
Adaptor sequences are clipped from repli-seq reads using cutadapt
version 1.14. Specifically, we run:
cutadapt -q 0 -O 1 -m 0 -a <adaptor> <fastq>
- The
is used to turn off low-quality base removal before adapter searching.
- The
sets the minimum required overlap length between read end and adaptor to be 1 (default is 3), in case the adaptor sequence partially overlaps with the read rather than being contained in a read.
- The
means that empty reads are kept and will appear in the output.
AGATCGGAAGAGCACACGTCTG is used as adaptor sequence.
Alignment
Filtering
For filtering valid Repli-seq alignments, we use samtools
.
Specifically, the filtering workflow consists of the following
steps:
- MAPQ filtering:
samtools view
command with-q 20
was used to skip alignments with MAPQ smaller than 20. - Sorting:
samtools sort
command was used to sort alignments by genomic coordinates. - Removal of PCR duplicates:
samtools rmdup
command was used to remove duplicate alignments.
Binning and Aggregation
Filtered reads were aggregated for each 5kb window using bedtools coverage
. Specifically, the following command was used:
bedtools coverage -counts -sorted -a <BINFILE> -b <INPUT_BAM>
Output is provided in both gzipped bedgraph
and bigwig
formats and can be viewed using HiGlass.
As of v16.1, the pipeline output includes a raw counts file in addition to the default scaled counts (RPKM).
Source files
The pipeline components are pre-installed in a publicly
available Docker image (4dndcic/4dn-repliseq:v16.1
) on
Docker Hub. The source code for the Docker image and pipeline
description in Common Workflow Language (CWL) can be found on
GitHub.
- Latest version (v16.1)
- Workflow metadata : https://data.4dnucleome.org/workflows/622bdf75-2dd1-457f-ad78-d4cd128f8f5b/
- CWL : https://github.com/4dn-dcic/docker-4dn-repliseq/tree/v16.1/cwl
- Docker : https://github.com/4dn-dcic/docker-4dn-repliseq/tree/v16.1
- Older versions
- v16
- Workflow metadata : https://data.4dnucleome.org/workflows/2a6807f1-93db-4c7b-b148-672534193974/
- CWL : https://github.com/4dn-dcic/docker-4dn-repliseq/tree/v16/cwl
- Docker : https://github.com/4dn-dcic/docker-4dn-repliseq/tree/v16
- v14
- Workflow metadata : https://data.4dnucleome.org/workflows/4459a4d8-1bd8-4b6a-b2cc-2506f4270a34/
- CWL : https://github.com/4dn-dcic/docker-4dn-repliseq/tree/v14/cwl
- Docker : https://github.com/4dn-dcic/docker-4dn-repliseq/tree/v14
- v13.1
- Workflow metadata : https://data.4dnucleome.org/workflows/146da22a-502d-4500-bf57-a7cf0b4b2364/
- CWL : https://github.com/4dn-dcic/docker-4dn-repliseq/tree/v13.1/cwl
- Docker : https://github.com/4dn-dcic/docker-4dn-repliseq/tree/v13.1