cleanUrl: "hic-processing-pipeline"
description: "Hi-C 데이터 프로세싱 파이프라인을 정리합니다."
4DN processing pipeline을 따라가자.
GitHub - 4dn-dcic/docker-4dn-hic at v43
https://micro-c.readthedocs.io/en/latest/contact_map.html
bwa mem -SP5M -t<nthreads> <genome_index> <fastq1> <fastq2>
-SP
option is used to ensure the results are equivalent to that obtained by running bwa mem
on each mate separately, while retaining the right formatting for paired-end reads. This option skips a step in bwa mem
that forces alignment of a poorly aligned read given an alignment of its mate with the assumption that the two mates are part of a single genomic segment.
-S
는 skip mate rescue-P
는 skip pairing; mate rescue performed unless -S
also in use-S
는 이 mate rescue가 수행되지 않도록 함. (근데 어떤 read가 mate rescue에 의해 매핑되었는지는 결과로부터 알 수는 없다고 한다…)-SP
옵션을 같이 쓰면 결국 paired-end 데이터이지만 single-read 처럼 alignment를 수행한다고 보면 된다.-5
option is used to report the 5’ portion of chimeric alignments as the primary alignment. In Hi-C experiments, when a mate has chimeric alignments, typically, the 5’ portion is the position of interest, while the 3’ portion represents the same fragment as the mate. For chimeric alignments, bwa mem
reports two alignments: one of them is annotated as primary and soft-clipped, retaining the full-length of the original sequence. The other end is annotated as hard-clipped and marked as either ‘supplementary’ or ‘secondary’. The -5
option forces the 5’ end to be always annotated as primary.-M
option is used to annotate the secondary/supplementary clipped reads as secondary rather than supplementary, for compatibility with some public software tools such as picard MarkDuplicates
.-t
option is used for multi-threading and should not affect the result.https://github.com/open2c/pairtools
parse
: read .sam files produced by bwa and form Hi-C pairssort
: sort pairs files (the lexicographic order for chromosomes, the numeric order for the positions, the lexicographic order for pair types)dedup
: remove PCR duplicates from a sorted triu-flipped .pairs file
markasdup
은 그냥 .pairs 파일 내에 존재하는 모든 pair의 type을 DD
로 바꾸는 용도임.select
: Select certain types of pairs based on their properties
pairtools select "mapq1>0 and mapq2>0" test.pairs.gz -o test.UU.pairs.gz
또는 pairtools select '(pair_type=="UU")' test.pairs.gz -o test.UU.pairs.gz
pairtools select '(pair_type=="UU") or (pair_type=="UC")' test.pairs.gz -o test.UU_UC.pairs.gz
stats
: Describe the types of distance properties of pairs
pairtools stats test.pairs.gz -o test.stats