mems::Aligner Class Reference

Used to find locally colinear blocks (LCBs) and do recursive alignments on the blocks To create an alignment one need only use the align method. More...

#include <Aligner.h>

Inheritance diagram for mems::Aligner:

Inheritance graph
[legend]
Collaboration diagram for mems::Aligner:

Collaboration graph
[legend]
List of all members.

Public Member Functions

void align (MatchList &mlist, IntervalList &interval_list, double LCB_minimum_density, double LCB_minimum_range, boolean recursive, boolean extend_lcbs, boolean gapped_alignment, std::string tree_filename="")
 Note: this algorithm differs from the one reported in the Mauve paper The modifications should make the Mauve method more sensitive Given an initial set of multi-MUMs, the alignment is an x step process: 1) Eliminate overlaps among the multi-MUMs 2) Compute a phylogenetic guide tree using the multi-MUMs 3) Remove subset multi-MUMs 4) Identify regions of collinearity (LCBs) among the remaining n-way multi-MUMs 5) Perform recursive anchor search within and outside LCBs 5a) search outside until weight stabilizes 5b) search within LCBs 6) Use greedy breakpoint elimination to remove low-weight LCBs 6a) whenever two LCBs coalesce, search the intervening region for multi-MUMs 7) Repeat 4, 5 and 6 until the total weight stabilizes 8) Perform gapped alignment on each LCB When limited area DP and POA are integrated, step 8 will become step 5c.

 Aligner (const Aligner &al)
 Aligner (uint seq_count)
 Constructs an aligner for the specified number of sequences.

void AlignLCB (MatchList &mlist, Interval &iv)
void DoSomethingCool (MatchList &mlist, Interval &iv)
void GetBestLCB (MatchList &r_list, MatchList &best_lcb)
Aligneroperator= (const Aligner &al)
void Recursion (MatchList &r_list, Match *r_begin, Match *r_end, boolean nway_only=false)
void RecursiveAnchorSearch (MatchList &mlist, gnSeqI minimum_weight, std::vector< MatchList > &LCB_list, boolean entire_genome, std::ostream *status_out=NULL)
void SearchWithinLCB (MatchList &mlist, std::vector< search_cache_t > &new_cache, bool leftmost=false, bool rightmost=false)
void SetGappedAligner (GappedAligner &gal)
void SetMaxExtensionIterations (uint ext_iters)
void SetMaxGappedAlignmentLength (gnSeqI len)
 forwards the request to whatever gapped aligner is being used

void SetMinRecursionGapLength (gnSeqI min_r_gap)
 Set the minimum size of intervening region between two anchor matches that will be considered for recursive anchor determination.

void SetPermutationOutput (std::string &permutation_filename, int64 permutation_weight)
 Set output parameters for permutation matrices.

void SetRecursive (bool value)
void WritePermutation (std::vector< LCB > &adjacencies, std::string out_filename)

Protected Member Functions

void consistencyCheck (uint lcb_count, std::vector< LCB > &adjacencies, std::vector< MatchList > &lcb_list, std::vector< int64 > &weights)

Protected Attributes

boolean collinear_genomes
 Set to true if all genomes are assumed to be collinear.

int64 cur_min_coverage
 Tracks the minimum weight of the least weight LCB.

boolean currently_recursing
 True when the recursive search has begun.

boolean debug
 Flag for debugging output.

boolean extend_lcbs
 Set to true if LCB extension should be attempted.

GappedAlignergal
TLS< MemHashgap_mh
 Used during recursive alignment.

boolean gapped_alignment
 Set to true to complete a gapped alignment.

double LCB_minimum_density
double LCB_minimum_range
uint max_extension_iters
 maximum number of attempts at LCB extension

gnSeqI min_recursive_gap_length
 Minimum size of gap regions that will be recursed on.

MaskedMemHash nway_mh
 Used during recursive alignment to find nway matches only.

std::string permutation_filename
int64 permutation_weight
boolean recursive
 Set to true if a recursive anchor search/gapped alignment should be performed.

std::vector< search_cache_tsearch_cache
 a list of recursive searches that have already been done

uint32 seq_count
 The number of sequences this aligner is working with.


Detailed Description

Used to find locally colinear blocks (LCBs) and do recursive alignments on the blocks To create an alignment one need only use the align method.

LCB lists are typically stored using the IntervalList class. They can be read and written in interval format using that class. For input and output of gapped alignments in other formats, see the gnAlignedSequences class. Other methods in this class are available for experimentation.

Definition at line 142 of file Aligner.h.


Constructor & Destructor Documentation

mems::Aligner::Aligner uint  seq_count  ) 
 

Constructs an aligner for the specified number of sequences.

Parameters:
seq_count The number of sequences that will be aligned with this Aligner

Definition at line 180 of file Aligner.cpp.

References mems::default_min_r_gap_size, and uint.

mems::Aligner::Aligner const Aligner al  ) 
 

Definition at line 191 of file Aligner.cpp.


Member Function Documentation

void mems::Aligner::align MatchList mlist,
IntervalList interval_list,
double  LCB_minimum_density,
double  LCB_minimum_range,
boolean  recursive,
boolean  extend_lcbs,
boolean  gapped_alignment,
std::string  tree_filename = ""
 

Note: this algorithm differs from the one reported in the Mauve paper The modifications should make the Mauve method more sensitive Given an initial set of multi-MUMs, the alignment is an x step process: 1) Eliminate overlaps among the multi-MUMs 2) Compute a phylogenetic guide tree using the multi-MUMs 3) Remove subset multi-MUMs 4) Identify regions of collinearity (LCBs) among the remaining n-way multi-MUMs 5) Perform recursive anchor search within and outside LCBs 5a) search outside until weight stabilizes 5b) search within LCBs 6) Use greedy breakpoint elimination to remove low-weight LCBs 6a) whenever two LCBs coalesce, search the intervening region for multi-MUMs 7) Repeat 4, 5 and 6 until the total weight stabilizes 8) Perform gapped alignment on each LCB When limited area DP and POA are integrated, step 8 will become step 5c.

Takes a MatchList as input and outputs a list of LCBs as an IntervalList. Several of the options can be used to filter out unlikely LCBs. If the recursive option is specified, the regions between matches in each LCB are searched for further homology and a full gapped alignment is produced.

Parameters:
mlist The MatchList to use as input for the alignment process
interval_list The IntervalList that is created by the alignment process
LCB_minimum_density The minimum density that an LCB may have to be considered a valid block This should be a number between 0 and 1.
LCB_minimum_range A misnomer: really it's the minimum number of matching base pairs an LCB must contain to be considered an LCB. Coverage is defined as (length of match) * (# of matching sequences)
recursive Option for performing a recursive alignment. If this is set to true, all regions which have gaps will be searched for exact matches.
extend_lcbs If true, attempt to extend the boundaries of LCBs by searching for additional matches between LCBs
tree_filename The name of the output file to write the phylogenetic guide tree into. If an empty string is specified then a temporary file is created.
Exceptions:
AlignerError may be thrown if an error occurs

Definition at line 2192 of file Aligner.cpp.

References mems::addUnalignedIntervals(), mems::AlignLCBInParallel(), collinear_genomes, CreateTempFileName(), mems::MuscleInterface::CreateTree(), mems::AlnProgressTracker::cur_leftend, currently_recursing, mems::DistanceMatrix(), mems::EliminateOverlaps(), gal, mems::Interval, mems::IntervalList, mems::MatchList, mems::GenericMatchList< MatchPtrType >::MultiplicityFilter(), nway_mh, mems::AlnProgressTracker::prev_progress, RecursiveAnchorSearch(), seq_count, mems::GenericMatchList< MatchPtrType >::seq_filename, mems::GenericIntervalList< MatchType >::seq_filename, mems::GenericIntervalList< MatchType >::seq_table, mems::GenericMatchList< MatchPtrType >::seq_table, mems::MaskedMemHash::SetMask(), mems::GenericInterval< GappedBaseImpl >::SetMatches(), mems::AlnProgressTracker::total_len, uint, and mems::uint64.

void mems::Aligner::AlignLCB MatchList mlist,
Interval iv
 

Definition at line 1370 of file Aligner.cpp.

References mems::GappedAligner::Align(), collinear_genomes, debug, gal, mems::Interval, mems::Match, mems::MatchList, seq_count, mems::GenericMatchList< MatchPtrType >::seq_table, mems::GenericInterval< GappedBaseImpl >::SetMatches(), and uint.

void mems::Aligner::consistencyCheck uint  lcb_count,
std::vector< LCB > &  adjacencies,
std::vector< MatchList > &  lcb_list,
std::vector< int64 > &  weights
[protected]
 

Definition at line 1585 of file Aligner.cpp.

References mems::AaronsLCB(), mems::filterMatches(), mems::MatchList, and uint.

void mems::Aligner::DoSomethingCool MatchList mlist,
Interval iv
 

void mems::Aligner::GetBestLCB MatchList r_list,
MatchList best_lcb
 

Aligner & mems::Aligner::operator= const Aligner al  ) 
 

Definition at line 207 of file Aligner.cpp.

References collinear_genomes, cur_min_coverage, debug, gal, gap_mh, LCB_minimum_density, LCB_minimum_range, max_extension_iters, min_recursive_gap_length, nway_mh, permutation_filename, permutation_weight, and seq_count.

void mems::Aligner::Recursion MatchList r_list,
Match r_begin,
Match r_end,
boolean  nway_only = false
 

Definition at line 1078 of file Aligner.cpp.

References mems::MemHash::Clear(), mems::MemHash::ClearSequences(), debug, mems::EliminateOverlaps(), mems::MemHash::FindMatches(), gap_mh, getDefaultSeedWeight(), mems::getInterveningCoordinates(), mems::MemHash::GetMatchList(), getSeed(), mems::UngappedLocalAlignment< AbstractMatchImpl >::Length(), mems::GenericMatchList< MatchPtrType >::LengthFilter(), mems::Match, mems::MatchList, MIN_DNA_SEED_WEIGHT, min_recursive_gap_length, mems::GenericMatchList< MatchPtrType >::MultiplicityFilter(), nway_mh, seq_count, mems::GenericMatchList< MatchPtrType >::seq_table, mems::GenericMatchList< MatchPtrType >::sml_table, uint, and mems::uint64.

Referenced by SearchWithinLCB().

void mems::Aligner::RecursiveAnchorSearch MatchList mlist,
gnSeqI  minimum_weight,
std::vector< MatchList > &  LCB_list,
boolean  entire_genome,
std::ostream *  status_out = NULL
 

Definition at line 1951 of file Aligner.cpp.

References mems::AaronsLCB(), mems::computeLCBAdjacencies_v2(), mems::ComputeLCBs(), mems::CreateGapSearchList(), cur_min_coverage, currently_recursing, extend_lcbs, mems::filterMatches(), mems::greedyBreakpointElimination_v4(), mems::Interval, mems::IntervalList, mems::MatchList, max_extension_iters, mems::NO_ADJACENCY, nway_mh, permutation_filename, permutation_weight, recursive, search_cache, mems::SearchLCBGaps(), SearchWithinLCB(), seq_count, mems::GenericIntervalList< MatchType >::seq_filename, mems::GenericMatchList< MatchPtrType >::seq_filename, mems::GenericIntervalList< MatchType >::seq_table, mems::GenericMatchList< MatchPtrType >::seq_table, uint, WritePermutation(), and mems::WritePermutationCoordinates().

Referenced by align().

void mems::Aligner::SearchWithinLCB MatchList mlist,
std::vector< search_cache_t > &  new_cache,
bool  leftmost = false,
bool  rightmost = false
 

Definition at line 1472 of file Aligner.cpp.

References mems::cache_comparator, debug, mems::EliminateOverlaps(), mems::Match, mems::MatchList, mems::GenericMatchList< MatchPtrType >::MultiplicityFilter(), Recursion(), search_cache, mems::search_cache_t, seq_count, and uint.

Referenced by RecursiveAnchorSearch().

void mems::Aligner::SetGappedAligner GappedAligner gal  ) 
 

Definition at line 235 of file Aligner.cpp.

void mems::Aligner::SetMaxExtensionIterations uint  ext_iters  )  [inline]
 

Definition at line 186 of file Aligner.h.

References max_extension_iters, and uint.

void mems::Aligner::SetMaxGappedAlignmentLength gnSeqI  len  ) 
 

forwards the request to whatever gapped aligner is being used

Definition at line 239 of file Aligner.cpp.

References gal, and mems::GappedAligner::SetMaxAlignmentLength().

void mems::Aligner::SetMinRecursionGapLength gnSeqI  min_r_gap  ) 
 

Set the minimum size of intervening region between two anchor matches that will be considered for recursive anchor determination.

When the gaps between two anchors are less than this cutoff value the region is handed off to the dynamic programming aligner e.g. ClustalW

Definition at line 231 of file Aligner.cpp.

References min_recursive_gap_length.

void mems::Aligner::SetPermutationOutput std::string &  permutation_filename,
int64  permutation_weight
 

Set output parameters for permutation matrices.

Definition at line 592 of file Aligner.cpp.

void mems::Aligner::SetRecursive bool  value  )  [inline]
 

Definition at line 200 of file Aligner.h.

void mems::Aligner::WritePermutation std::vector< LCB > &  adjacencies,
std::string  out_filename
 

Definition at line 1886 of file Aligner.cpp.

References seq_count, and uint.

Referenced by RecursiveAnchorSearch().


Member Data Documentation

boolean mems::Aligner::collinear_genomes [protected]
 

Set to true if all genomes are assumed to be collinear.

Definition at line 222 of file Aligner.h.

Referenced by align(), AlignLCB(), and operator=().

int64 mems::Aligner::cur_min_coverage [protected]
 

Tracks the minimum weight of the least weight LCB.

Definition at line 212 of file Aligner.h.

Referenced by operator=(), and RecursiveAnchorSearch().

boolean mems::Aligner::currently_recursing [protected]
 

True when the recursive search has begun.

Definition at line 221 of file Aligner.h.

Referenced by align(), and RecursiveAnchorSearch().

boolean mems::Aligner::debug [protected]
 

Flag for debugging output.

Reimplemented in mems::ProgressiveAligner.

Definition at line 205 of file Aligner.h.

Referenced by AlignLCB(), operator=(), Recursion(), and SearchWithinLCB().

boolean mems::Aligner::extend_lcbs [protected]
 

Set to true if LCB extension should be attempted.

Definition at line 219 of file Aligner.h.

Referenced by RecursiveAnchorSearch().

GappedAligner* mems::Aligner::gal [protected]
 

Definition at line 224 of file Aligner.h.

Referenced by align(), AlignLCB(), operator=(), and SetMaxGappedAlignmentLength().

TLS<MemHash> mems::Aligner::gap_mh [protected]
 

Used during recursive alignment.

Definition at line 202 of file Aligner.h.

Referenced by operator=(), and Recursion().

boolean mems::Aligner::gapped_alignment [protected]
 

Set to true to complete a gapped alignment.

Definition at line 220 of file Aligner.h.

double mems::Aligner::LCB_minimum_density [protected]
 

Definition at line 207 of file Aligner.h.

Referenced by operator=().

double mems::Aligner::LCB_minimum_range [protected]
 

Definition at line 208 of file Aligner.h.

Referenced by operator=().

uint mems::Aligner::max_extension_iters [protected]
 

maximum number of attempts at LCB extension

Definition at line 210 of file Aligner.h.

Referenced by operator=(), RecursiveAnchorSearch(), and SetMaxExtensionIterations().

gnSeqI mems::Aligner::min_recursive_gap_length [protected]
 

Minimum size of gap regions that will be recursed on.

Definition at line 214 of file Aligner.h.

Referenced by operator=(), Recursion(), and SetMinRecursionGapLength().

MaskedMemHash mems::Aligner::nway_mh [protected]
 

Used during recursive alignment to find nway matches only.

Definition at line 203 of file Aligner.h.

Referenced by align(), operator=(), Recursion(), and RecursiveAnchorSearch().

std::string mems::Aligner::permutation_filename [protected]
 

Definition at line 226 of file Aligner.h.

Referenced by operator=(), and RecursiveAnchorSearch().

int64 mems::Aligner::permutation_weight [protected]
 

Definition at line 227 of file Aligner.h.

Referenced by operator=(), and RecursiveAnchorSearch().

boolean mems::Aligner::recursive [protected]
 

Set to true if a recursive anchor search/gapped alignment should be performed.

Definition at line 218 of file Aligner.h.

Referenced by RecursiveAnchorSearch().

std::vector< search_cache_t > mems::Aligner::search_cache [protected]
 

a list of recursive searches that have already been done

Definition at line 229 of file Aligner.h.

Referenced by RecursiveAnchorSearch(), and SearchWithinLCB().

uint32 mems::Aligner::seq_count [protected]
 

The number of sequences this aligner is working with.

Definition at line 204 of file Aligner.h.

Referenced by align(), AlignLCB(), operator=(), Recursion(), RecursiveAnchorSearch(), SearchWithinLCB(), and WritePermutation().


The documentation for this class was generated from the following files:
Generated on Fri Mar 14 06:01:38 2008 for libMems by doxygen 1.3.6