Variants¶
Module genomvar.variant
contains classes representing genomic
alterations.
The hierarchy of the classes used in the package is the following:
VariantBase
/ | \
AmbigIndel <-Indel V |
| | / \ MNP V
| ----+--- Del Ins | Haplotype
| | | | V
V V | V SNP
AmbigDel -----> AmbigIns
All variants have start
and end
attributes defining a range they
act on and can be searched overlap for.
To test whether a variant is instance of some type
is_variant_instance()
method can be used. Variant equality
can be tested using edit_equal()
.
Objects can be instantiated directly, e.g.:
>>> vrt = variant.MNP('chr1',154678,'GT')
>>> print(vrt)
<MNP chr1:154678-154680 NN/GT>
This will create an MNP which substitutes positions 154678 and 154679 on
chromosome 1 for GT
.
Alternatively variants can be created using VariantFactory
objects. This class can work with VCF-like notation. For example,
>>> fac = VariantFactory()
>>> vrt = fac.from_edit('chr15',575,'TA','T')
>>> print(vrt)
<Del chr15:576-577 A/->
Position is 0-based so it creates a deletion at position 577 of chromosome 15.
Alternatively, limited subset of HGVS notation is supported (numbering in HGVS strings is 1-based following the spec):
>>> vrt = fac.from_hgvs('chr1:g.15C>A')
>>> print(vrt)
<SNP chr1:14 C/A>
Variant sets defined in genomvar.varset
use class GenomVariant
.
Objects of this class contain genomic alteration (attribute base
)
and optionally, genotype (attribute GT
) and other attributes
commonly found in VCF files (attribute attrib
). Attribute
base
is an object of some VariantBase
subclass (SNPs, Deletions
etc.).
Variant classes¶
Basic classes are SNP representing a single nucleotide polimorphism and its generalization MNP – multiple nucleotide polimorphism.
-
class
genomvar.variant.
SNP
(chrom, start, alt, end=None, ref=None) Single-nucleotide polymorphism.
For instantiation it requires chromosome, position and alternative sequence.
>>> from genomvar import variant >>> variant.SNP('chr1',154678,'T') SNP("chr1",154678,"T")
-
class
genomvar.variant.
MNP
(chrom, start, alt, end=None, ref=None) Multiple-nucleotide polymorphism. Substitute N nucleotides of the reference for N other nucleotides.
For instantiation it requires chromosome,position and alternative sequence.
end
will inferred fromstart
andalt
.ref
is also optional.>>> from genomvar import variant >>> variant.MNP('chr1',154678,'GT') MNP("chr1",154678,"GT")
There are separate classes of insetion and deletion inheriting from
abstract class Indel
.
-
class
genomvar.variant.
Ins
(chrom, start, alt, end=None, ref=None) Insertion of nucleotides. For instantiation
chrom
,start
and inserted sequence (alt
) are required.Start and end denote the nucleotide after the inserted sequence, i.e.
start
is 0-based number of a nucleotide after insertion,end
isstart+1
by definition.>>> from genomvar.variant import Ins # Insertion of TA before position chr2:100543. >>> print(Ins('chr2',100543,'TA')) <Ins chr2:100543 -/TA>
-
class
genomvar.variant.
Del
(chrom, start, end, ref=None, alt=None) Deletion of nucleotides. For instantiation
chrom
,start
(0-based), andend
(positionend
is excluded) are required.>>> from genomvar.variant import Del # Deletion of 3 nucleotides starting at chr3:7843488 (0-based) >>> print(Del('chr3',7843488,7843488+3)) <Del chr3:7843488-7843491 NNN/->
There is a special flavor of indels for cases when it can be
applied in several places resulting in the same alternate sequence,
termed to as ambigous
indels. They are AmbigDel
,
AmbigIns
which on top of regular deletion or insertion
attributes
contain information about a region they can be applied to. For
instantion VariantFactory
with a reference is needed.
-
class
genomvar.variant.
AmbigIns
(chrom, start, end, alt, ref=None) Class representing indel which position is ambigous. Ambiguity means the indel could be applied in any position of some region resulting in the same alternative sequence.
Let the reference file
test.fasta
contain a toy sequence:>seq1 TTTAATA
Consider a variant extending 3
T
s in the beginning by one more T. It can be done in several places so the corresponding insertion can be given as andAmbigIns
object:>>> from genomvar import Reference >>> from genomvar.variant import VariantFactory >>> fac = VariantFactory(Reference('test.fasta'),normindel=True) >>> print( fac.from_edit('seq1',0,'T','TT') ) <AmbigIns seq1:0-4(1-2) -/T>
Positions 1 and 2 are actual start and end meaning that T is inserted before nucleotide located 1-2. Positions 0-4 indicate that start and end can be extended to these values resulting in the same alteration.
-
class
genomvar.variant.
AmbigDel
(chrom, start, end, ref=None, alt=None) Class representing del which position is ambigous. Ambiguity means the same number of positions could be deleted in some range resulting in the same aternative sequence.
Let the reference file
test.fasta
contain a toy sequence:>seq1 TCTTTTTGACTGG
>>> fac = VariantFactory(Reference('test.fasta'),normindel=True) >>> print( fac.from_edit('seq1',1,'CTTTTTGAC','C') ) <AmbigDel seq1:1-11(2-10) TTTTTGAC/->
Deletion of TTTTTGAC starts at 2 and ends on 9th nucleotide (including 9th resulting in range 2-10). 1-11 denote that start and end can be extended to these values resulting in the same alteration.
There is a separate class Haplotype
which can hold any combination
of objects of variant classes above.
-
class
genomvar.variant.
Haplotype
(chrom, variants) An object representing genome variants on the same chromosome (or contig).
Can be instantiated from a list of GenomVariant objects using
Haplotype.from_variants()
class method.
Additionally module contains two technically driven types: Null (no variant at all, reference), Asterisk (for * in the ALT field of VCF files).