#include <Aligner.h>
Inheritance diagram for mems::Aligner:


Public Member Functions | |
| void | align (MatchList &mlist, IntervalList &interval_list, double LCB_minimum_density, double LCB_minimum_range, boolean recursive, boolean extend_lcbs, boolean gapped_alignment, std::string tree_filename="") |
| Note: this algorithm differs from the one reported in the Mauve paper The modifications should make the Mauve method more sensitive Given an initial set of multi-MUMs, the alignment is an x step process: 1) Eliminate overlaps among the multi-MUMs 2) Compute a phylogenetic guide tree using the multi-MUMs 3) Remove subset multi-MUMs 4) Identify regions of collinearity (LCBs) among the remaining n-way multi-MUMs 5) Perform recursive anchor search within and outside LCBs 5a) search outside until weight stabilizes 5b) search within LCBs 6) Use greedy breakpoint elimination to remove low-weight LCBs 6a) whenever two LCBs coalesce, search the intervening region for multi-MUMs 7) Repeat 4, 5 and 6 until the total weight stabilizes 8) Perform gapped alignment on each LCB When limited area DP and POA are integrated, step 8 will become step 5c. | |
| Aligner (const Aligner &al) | |
| Aligner (uint seq_count) | |
| Constructs an aligner for the specified number of sequences. | |
| void | AlignLCB (MatchList &mlist, Interval &iv) |
| void | DoSomethingCool (MatchList &mlist, Interval &iv) |
| void | GetBestLCB (MatchList &r_list, MatchList &best_lcb) |
| Aligner & | operator= (const Aligner &al) |
| void | Recursion (MatchList &r_list, Match *r_begin, Match *r_end, boolean nway_only=false) |
| void | RecursiveAnchorSearch (MatchList &mlist, gnSeqI minimum_weight, std::vector< MatchList > &LCB_list, boolean entire_genome, std::ostream *status_out=NULL) |
| void | SearchWithinLCB (MatchList &mlist, std::vector< search_cache_t > &new_cache, bool leftmost=false, bool rightmost=false) |
| void | SetGappedAligner (GappedAligner &gal) |
| void | SetMaxExtensionIterations (uint ext_iters) |
| void | SetMaxGappedAlignmentLength (gnSeqI len) |
| forwards the request to whatever gapped aligner is being used | |
| void | SetMinRecursionGapLength (gnSeqI min_r_gap) |
| Set the minimum size of intervening region between two anchor matches that will be considered for recursive anchor determination. | |
| void | SetPermutationOutput (std::string &permutation_filename, int64 permutation_weight) |
| Set output parameters for permutation matrices. | |
| void | SetRecursive (bool value) |
| void | WritePermutation (std::vector< LCB > &adjacencies, std::string out_filename) |
Protected Member Functions | |
| void | consistencyCheck (uint lcb_count, std::vector< LCB > &adjacencies, std::vector< MatchList > &lcb_list, std::vector< int64 > &weights) |
Protected Attributes | |
| boolean | collinear_genomes |
| Set to true if all genomes are assumed to be collinear. | |
| int64 | cur_min_coverage |
| Tracks the minimum weight of the least weight LCB. | |
| boolean | currently_recursing |
| True when the recursive search has begun. | |
| boolean | debug |
| Flag for debugging output. | |
| boolean | extend_lcbs |
| Set to true if LCB extension should be attempted. | |
| GappedAligner * | gal |
| TLS< MemHash > | gap_mh |
| Used during recursive alignment. | |
| boolean | gapped_alignment |
| Set to true to complete a gapped alignment. | |
| double | LCB_minimum_density |
| double | LCB_minimum_range |
| uint | max_extension_iters |
| maximum number of attempts at LCB extension | |
| gnSeqI | min_recursive_gap_length |
| Minimum size of gap regions that will be recursed on. | |
| MaskedMemHash | nway_mh |
| Used during recursive alignment to find nway matches only. | |
| std::string | permutation_filename |
| int64 | permutation_weight |
| boolean | recursive |
| Set to true if a recursive anchor search/gapped alignment should be performed. | |
| std::vector< search_cache_t > | search_cache |
| a list of recursive searches that have already been done | |
| uint32 | seq_count |
| The number of sequences this aligner is working with. | |
LCB lists are typically stored using the IntervalList class. They can be read and written in interval format using that class. For input and output of gapped alignments in other formats, see the gnAlignedSequences class. Other methods in this class are available for experimentation.
Definition at line 142 of file Aligner.h.
|
|
Constructs an aligner for the specified number of sequences.
Definition at line 180 of file Aligner.cpp. References mems::default_min_r_gap_size, and uint. |
|
|
Definition at line 191 of file Aligner.cpp. |
|
||||||||||||||||||||||||||||||||||||
|
Note: this algorithm differs from the one reported in the Mauve paper The modifications should make the Mauve method more sensitive Given an initial set of multi-MUMs, the alignment is an x step process: 1) Eliminate overlaps among the multi-MUMs 2) Compute a phylogenetic guide tree using the multi-MUMs 3) Remove subset multi-MUMs 4) Identify regions of collinearity (LCBs) among the remaining n-way multi-MUMs 5) Perform recursive anchor search within and outside LCBs 5a) search outside until weight stabilizes 5b) search within LCBs 6) Use greedy breakpoint elimination to remove low-weight LCBs 6a) whenever two LCBs coalesce, search the intervening region for multi-MUMs 7) Repeat 4, 5 and 6 until the total weight stabilizes 8) Perform gapped alignment on each LCB When limited area DP and POA are integrated, step 8 will become step 5c. Takes a MatchList as input and outputs a list of LCBs as an IntervalList. Several of the options can be used to filter out unlikely LCBs. If the recursive option is specified, the regions between matches in each LCB are searched for further homology and a full gapped alignment is produced.
Definition at line 2192 of file Aligner.cpp. References mems::addUnalignedIntervals(), mems::AlignLCBInParallel(), collinear_genomes, CreateTempFileName(), mems::MuscleInterface::CreateTree(), mems::AlnProgressTracker::cur_leftend, currently_recursing, mems::DistanceMatrix(), mems::EliminateOverlaps(), gal, mems::Interval, mems::IntervalList, mems::MatchList, mems::GenericMatchList< MatchPtrType >::MultiplicityFilter(), nway_mh, mems::AlnProgressTracker::prev_progress, RecursiveAnchorSearch(), seq_count, mems::GenericMatchList< MatchPtrType >::seq_filename, mems::GenericIntervalList< MatchType >::seq_filename, mems::GenericIntervalList< MatchType >::seq_table, mems::GenericMatchList< MatchPtrType >::seq_table, mems::MaskedMemHash::SetMask(), mems::GenericInterval< GappedBaseImpl >::SetMatches(), mems::AlnProgressTracker::total_len, uint, and mems::uint64. |
|
||||||||||||
|
Definition at line 1370 of file Aligner.cpp. References mems::GappedAligner::Align(), collinear_genomes, debug, gal, mems::Interval, mems::Match, mems::MatchList, seq_count, mems::GenericMatchList< MatchPtrType >::seq_table, mems::GenericInterval< GappedBaseImpl >::SetMatches(), and uint. |
|
||||||||||||||||||||
|
Definition at line 1585 of file Aligner.cpp. References mems::AaronsLCB(), mems::filterMatches(), mems::MatchList, and uint. |
|
||||||||||||
|
|
|
||||||||||||
|
|
|
|
Definition at line 207 of file Aligner.cpp. References collinear_genomes, cur_min_coverage, debug, gal, gap_mh, LCB_minimum_density, LCB_minimum_range, max_extension_iters, min_recursive_gap_length, nway_mh, permutation_filename, permutation_weight, and seq_count. |
|
||||||||||||||||||||
|
||||||||||||||||||||||||
|
||||||||||||||||||||
|
Definition at line 1472 of file Aligner.cpp. References mems::cache_comparator, debug, mems::EliminateOverlaps(), mems::Match, mems::MatchList, mems::GenericMatchList< MatchPtrType >::MultiplicityFilter(), Recursion(), search_cache, mems::search_cache_t, seq_count, and uint. Referenced by RecursiveAnchorSearch(). |
|
|
Definition at line 235 of file Aligner.cpp. |
|
|
Definition at line 186 of file Aligner.h. References max_extension_iters, and uint. |
|
|
forwards the request to whatever gapped aligner is being used
Definition at line 239 of file Aligner.cpp. References gal, and mems::GappedAligner::SetMaxAlignmentLength(). |
|
|
Set the minimum size of intervening region between two anchor matches that will be considered for recursive anchor determination. When the gaps between two anchors are less than this cutoff value the region is handed off to the dynamic programming aligner e.g. ClustalW Definition at line 231 of file Aligner.cpp. References min_recursive_gap_length. |
|
||||||||||||
|
Set output parameters for permutation matrices.
Definition at line 592 of file Aligner.cpp. |
|
|
|
|
||||||||||||
|
Definition at line 1886 of file Aligner.cpp. References seq_count, and uint. Referenced by RecursiveAnchorSearch(). |
|
|
Set to true if all genomes are assumed to be collinear.
Definition at line 222 of file Aligner.h. Referenced by align(), AlignLCB(), and operator=(). |
|
|
Tracks the minimum weight of the least weight LCB.
Definition at line 212 of file Aligner.h. Referenced by operator=(), and RecursiveAnchorSearch(). |
|
|
True when the recursive search has begun.
Definition at line 221 of file Aligner.h. Referenced by align(), and RecursiveAnchorSearch(). |
|
|
Flag for debugging output.
Reimplemented in mems::ProgressiveAligner. Definition at line 205 of file Aligner.h. Referenced by AlignLCB(), operator=(), Recursion(), and SearchWithinLCB(). |
|
|
Set to true if LCB extension should be attempted.
Definition at line 219 of file Aligner.h. Referenced by RecursiveAnchorSearch(). |
|
|
Definition at line 224 of file Aligner.h. Referenced by align(), AlignLCB(), operator=(), and SetMaxGappedAlignmentLength(). |
|
|
Used during recursive alignment.
Definition at line 202 of file Aligner.h. Referenced by operator=(), and Recursion(). |
|
|
Set to true to complete a gapped alignment.
|
|
|
Definition at line 207 of file Aligner.h. Referenced by operator=(). |
|
|
Definition at line 208 of file Aligner.h. Referenced by operator=(). |
|
|
maximum number of attempts at LCB extension
Definition at line 210 of file Aligner.h. Referenced by operator=(), RecursiveAnchorSearch(), and SetMaxExtensionIterations(). |
|
|
Minimum size of gap regions that will be recursed on.
Definition at line 214 of file Aligner.h. Referenced by operator=(), Recursion(), and SetMinRecursionGapLength(). |
|
|
Used during recursive alignment to find nway matches only.
Definition at line 203 of file Aligner.h. Referenced by align(), operator=(), Recursion(), and RecursiveAnchorSearch(). |
|
|
Definition at line 226 of file Aligner.h. Referenced by operator=(), and RecursiveAnchorSearch(). |
|
|
Definition at line 227 of file Aligner.h. Referenced by operator=(), and RecursiveAnchorSearch(). |
|
|
Set to true if a recursive anchor search/gapped alignment should be performed.
Definition at line 218 of file Aligner.h. Referenced by RecursiveAnchorSearch(). |
|
|
a list of recursive searches that have already been done
Definition at line 229 of file Aligner.h. Referenced by RecursiveAnchorSearch(), and SearchWithinLCB(). |
|
|
The number of sequences this aligner is working with.
Definition at line 204 of file Aligner.h. Referenced by align(), AlignLCB(), operator=(), Recursion(), RecursiveAnchorSearch(), SearchWithinLCB(), and WritePermutation(). |
1.3.6