mems::MemHash Class Reference

MemHash implements an algorithm for finding exact matches of a certain minimal length in several sequences. More...

#include <MemHash.h>

Inheritance diagram for mems::MemHash:

Inheritance graph
[legend]
Collaboration diagram for mems::MemHash:

Collaboration graph
[legend]
List of all members.

Public Member Functions

virtual void Clear ()
virtual void ClearSequences ()
virtual MemHashClone () const
virtual boolean CreateMatches ()
 Generates exact matches for the sequences loaded into this MemHash.

virtual void FindMatches (MatchList &match_list)
 Finds all maximal exact matches in the sequences contained by "match_list" The resulting list of matches is stored within "match_list".

virtual void FindMatchesFromPosition (MatchList &match_list, const std::vector< gnSeqI > &start_points)
virtual uint32 GetEnumerationTolerance () const
template<class MatchListType> void GetMatchList (MatchListType &mem_list) const
 Use this to convert MatchHashEntry mem list to a generic match list type converts the mem_list into the type specified by MatchListType.

virtual MatchList GetMatchList () const
 Creates a new MatchList instance which contains all the matches found by calling Create().

virtual uint32 GetRepeatTolerance () const
virtual void LoadFile (std::istream &mem_file)
 Reads in a list of mems from an input stream
Exceptions:
A InvalidFileFormat exception if the file format is unknown or the file is corrupt.


virtual uint32 MemCollisionCount ()
 Returns the number of mers thrown out because they were contained in an existing mem.

virtual uint32 MemCount ()
 Returns the number of mems found.

 MemHash (const MemHash &mh)
 MemHash ()
virtual void MemTableCount (std::vector< uint32 > &table_count)
MemHashoperator= (const MemHash &mh)
virtual void PrintDistribution (std::ostream &os) const
 Prints the number of matches in each hash table bucket to the ostream os.

virtual void SetEnumerationTolerance (uint32 enumeration_tolerance)
 Sets the match seed repeat enumeration tolerance.

void SetMatchLog (std::ostream *match_log)
 Setting this to a non-null value causes matches to be logged as they are found.

virtual void SetRepeatTolerance (uint32 repeat_tolerance)
 Sets the permitted repetitivity of match seeds.

virtual void SetTableSize (uint32 new_table_size)
 Sets the size of the hash table to new_table_size.

virtual uint32 TableSize () const
 Returns the size of the hash table being used.

virtual void WriteFile (std::ostream &mem_file) const
 Writes the matches stored in this MemHash out to the ostream.

 ~MemHash ()

Protected Member Functions

virtual MatchHashEntryAddHashEntry (MatchHashEntry &mhe)
virtual boolean EnumerateMatches (IdmerList &match_list)
virtual boolean HashMatch (IdmerList &match_list)
 Called whenever a mer match is found.

virtual uint32 quadratic_li (uint32 listI)
virtual void SetDirection (MatchHashEntry &mhe)

Protected Attributes

std::vector< MatchHashEntry * > allocated
SlotAllocator< MatchHashEntry > & allocator
uint64 m_collision_count
uint32 m_enumeration_tolerance
uint64 m_mem_count
uint32 m_repeat_tolerance
std::ostream * match_log
std::vector< std::vector<
MatchHashEntry * > > 
mem_table
std::vector< uint32mem_table_count
MheCompare mhecomp
uint32 table_size

Detailed Description

MemHash implements an algorithm for finding exact matches of a certain minimal length in several sequences.

Definition at line 38 of file MemHash.h.


Constructor & Destructor Documentation

mems::MemHash::MemHash  ) 
 

Definition at line 23 of file MemHash.cpp.

References m_collision_count, m_enumeration_tolerance, m_mem_count, m_repeat_tolerance, match_log, mem_table, mem_table_count, table_size, and mems::uint32.

Referenced by Clone().

mems::MemHash::~MemHash  ) 
 

Definition at line 41 of file MemHash.cpp.

mems::MemHash::MemHash const MemHash mh  ) 
 

Definition at line 45 of file MemHash.cpp.


Member Function Documentation

MatchHashEntry * mems::MemHash::AddHashEntry MatchHashEntry mhe  )  [protected, virtual]
 

Definition at line 209 of file MemHash.cpp.

References mems::SlotAllocator< MatchHashEntry >::Allocate(), allocated, allocator, mems::MatchHashEntry::Extended(), mems::MatchFinder::ExtendMatch(), m_collision_count, m_mem_count, match_log, mem_table, mem_table_count, mhecomp, mems::MatchHashEntry::Offset(), table_size, and mems::uint32.

Referenced by mems::RepeatHash::HashMatch(), HashMatch(), mems::MaskedMemHash::HashMatch(), and LoadFile().

void mems::MemHash::Clear  )  [virtual]
 

Reimplemented from mems::MatchFinder.

Definition at line 76 of file MemHash.cpp.

References allocated, allocator, mems::SlotAllocator< MatchHashEntry >::Free(), m_collision_count, m_enumeration_tolerance, m_mem_count, m_repeat_tolerance, match_log, mem_table, mem_table_count, table_size, and mems::uint32.

Referenced by mems::Aligner::Recursion(), and mems::SearchLCBGaps().

void mems::MemHash::ClearSequences  )  [virtual]
 

Definition at line 71 of file MemHash.cpp.

Referenced by mems::Aligner::Recursion().

MemHash * mems::MemHash::Clone  )  const [virtual]
 

Reimplemented in mems::MaskedMemHash, mems::PairwiseMatchFinder, mems::ParallelMemHash, and mems::RepeatHash.

Definition at line 67 of file MemHash.cpp.

References MemHash().

boolean mems::MemHash::CreateMatches  )  [virtual]
 

Generates exact matches for the sequences loaded into this MemHash.

Reimplemented in mems::RepeatHash.

Definition at line 104 of file MemHash.cpp.

boolean mems::MemHash::EnumerateMatches IdmerList match_list  )  [protected, virtual]
 

Reimplemented from mems::MatchFinder.

Reimplemented in mems::PairwiseMatchFinder, and mems::RepeatHash.

Definition at line 139 of file MemHash.cpp.

References HashMatch(), mems::IdmerList, m_enumeration_tolerance, and m_repeat_tolerance.

void mems::MemHash::FindMatches MatchList match_list  )  [virtual]
 

Finds all maximal exact matches in the sequences contained by "match_list" The resulting list of matches is stored within "match_list".

Definition at line 109 of file MemHash.cpp.

References FindMatchesFromPosition(), mems::MatchList, mems::GenericMatchList< MatchPtrType >::seq_table, and mems::uint32.

Referenced by mems::Aligner::Recursion(), and mems::SearchLCBGaps().

void mems::MemHash::FindMatchesFromPosition MatchList match_list,
const std::vector< gnSeqI > &  start_points
[virtual]
 

Definition at line 117 of file MemHash.cpp.

References mems::MatchFinder::AddSequence(), GetMatchList(), mems::MatchList, mems::GenericMatchList< MatchPtrType >::seq_filename, mems::GenericMatchList< MatchPtrType >::seq_table, mems::GenericMatchList< MatchPtrType >::sml_table, and mems::uint32.

Referenced by FindMatches().

virtual uint32 mems::MemHash::GetEnumerationTolerance  )  const [inline, virtual]
 

Returns:
the match seed repeat enumeration tolerance.
See also:
SetEnumerationTolerance

Definition at line 144 of file MemHash.h.

References m_enumeration_tolerance, and mems::uint32.

template<class MatchListType>
void mems::MemHash::GetMatchList MatchListType &  mem_list  )  const
 

Use this to convert MatchHashEntry mem list to a generic match list type converts the mem_list into the type specified by MatchListType.

Definition at line 183 of file MemHash.h.

References mem_table, table_size, and mems::uint32.

MatchList mems::MemHash::GetMatchList  )  const [virtual]
 

Creates a new MatchList instance which contains all the matches found by calling Create().

Definition at line 129 of file MemHash.cpp.

References mems::MatchList, mems::GenericMatchList< MatchPtrType >::seq_table, and mems::GenericMatchList< MatchPtrType >::sml_table.

Referenced by FindMatchesFromPosition(), and mems::Aligner::Recursion().

virtual uint32 mems::MemHash::GetRepeatTolerance  )  const [inline, virtual]
 

Returns:
the permitted repetitivity of match seeds.
See also:
SetRepeatTolerance

Definition at line 130 of file MemHash.h.

References m_repeat_tolerance, and mems::uint32.

boolean mems::MemHash::HashMatch IdmerList match_list  )  [protected, virtual]
 

Called whenever a mer match is found.

Implements mems::MatchFinder.

Reimplemented in mems::RepeatHash.

Definition at line 167 of file MemHash.cpp.

References AddHashEntry(), mems::MatchHashEntry::CalculateOffset(), mems::MatchFinder::GetSar(), mems::IdmerList, SetDirection(), and mems::UngappedLocalAlignment< AbstractMatchImpl >::SetLength().

Referenced by mems::PairwiseMatchFinder::EnumerateMatches(), and EnumerateMatches().

void mems::MemHash::LoadFile std::istream &  mem_file  )  [virtual]
 

Reads in a list of mems from an input stream

Exceptions:
A InvalidFileFormat exception if the file format is unknown or the file is corrupt.

Definition at line 266 of file MemHash.cpp.

References AddHashEntry(), mems::MatchHashEntry::CalculateOffset(), mems::UngappedLocalAlignment< AbstractMatchImpl >::SetLength(), and mems::uint32.

virtual uint32 mems::MemHash::MemCollisionCount  )  [inline, virtual]
 

Returns the number of mers thrown out because they were contained in an existing mem.

Returns:
The number of mers thrown out because they were contained in an existing mem

Definition at line 99 of file MemHash.h.

References m_collision_count, and mems::uint32.

virtual uint32 mems::MemHash::MemCount  )  [inline, virtual]
 

Returns the number of mems found.

Returns:
The number of mems found

Definition at line 94 of file MemHash.h.

References m_mem_count, and mems::uint32.

virtual void mems::MemHash::MemTableCount std::vector< uint32 > &  table_count  )  [inline, virtual]
 

Definition at line 100 of file MemHash.h.

References mem_table_count.

MemHash & mems::MemHash::operator= const MemHash mh  ) 
 

Definition at line 50 of file MemHash.cpp.

References m_collision_count, m_enumeration_tolerance, m_mem_count, m_repeat_tolerance, match_log, mem_table, mem_table_count, mems::MatchFinder::mer_size, mems::MatchFinder::seq_count, table_size, and mems::uint32.

void mems::MemHash::PrintDistribution std::ostream &  os  )  const [virtual]
 

Prints the number of matches in each hash table bucket to the ostream os.

Parameters:
os The stream to print to.

Definition at line 253 of file MemHash.cpp.

References mem_table, mem_table_count, and mems::uint32.

virtual uint32 mems::MemHash::quadratic_li uint32  listI  )  [inline, protected, virtual]
 

Definition at line 160 of file MemHash.h.

References mems::uint32.

void mems::MemHash::SetDirection MatchHashEntry mhe  )  [protected, virtual]
 

Definition at line 189 of file MemHash.cpp.

References mems::SortedMerList::GetMer(), mems::MatchFinder::GetSar(), and mems::uint32.

Referenced by mems::RepeatHash::HashMatch(), HashMatch(), and mems::MaskedMemHash::HashMatch().

virtual void mems::MemHash::SetEnumerationTolerance uint32  enumeration_tolerance  )  [inline, virtual]
 

Sets the match seed repeat enumeration tolerance.

When matching mers are found across sequences which also occur several times in any particular sequence there are several possible match seeds which could be generated. The enumeration tolerance controls how many of these possibilities are actually used as match seeds and extended into full matches. The selection of actual seeds from the realm of possibilities is essentially arbitrary, though not explicitly randomized.

Definition at line 139 of file MemHash.h.

References m_enumeration_tolerance, and mems::uint32.

void mems::MemHash::SetMatchLog std::ostream *  match_log  )  [inline]
 

Setting this to a non-null value causes matches to be logged as they are found.

Definition at line 149 of file MemHash.h.

virtual void mems::MemHash::SetRepeatTolerance uint32  repeat_tolerance  )  [inline, virtual]
 

Sets the permitted repetitivity of match seeds.

Set

Parameters:
repeat_tolerance to 0 to generate MUMs, any higher setting will generate MEMs Many possible combinations of repetitive seed matches may be ignored, depending on the setting of the repeat enumeration tolerance.
See also:
SetEnumerationTolerance
Parameters:
repeat_tolerance the permitted repetitivity of match seeds

Definition at line 125 of file MemHash.h.

References m_repeat_tolerance, and mems::uint32.

void mems::MemHash::SetTableSize uint32  new_table_size  )  [virtual]
 

Sets the size of the hash table to new_table_size.

Parameters:
new_table_size The new hash table size

Definition at line 95 of file MemHash.cpp.

References mem_table, mem_table_count, table_size, and mems::uint32.

virtual uint32 mems::MemHash::TableSize  )  const [inline, virtual]
 

Returns the size of the hash table being used.

Returns:
the size of the hash table being used.

Definition at line 67 of file MemHash.h.

References table_size, and mems::uint32.

void mems::MemHash::WriteFile std::ostream &  mem_file  )  const [virtual]
 

Writes the matches stored in this MemHash out to the ostream.

Parameters:
mem_file. 

Definition at line 306 of file MemHash.cpp.

References m_mem_count, mem_table, table_size, and mems::uint32.


Member Data Documentation

std::vector<MatchHashEntry*> mems::MemHash::allocated [protected]
 

Definition at line 172 of file MemHash.h.

Referenced by AddHashEntry(), and Clear().

SlotAllocator<MatchHashEntry>& mems::MemHash::allocator [protected]
 

Definition at line 171 of file MemHash.h.

Referenced by AddHashEntry(), and Clear().

uint64 mems::MemHash::m_collision_count [protected]
 

Definition at line 167 of file MemHash.h.

Referenced by AddHashEntry(), Clear(), MemCollisionCount(), MemHash(), and operator=().

uint32 mems::MemHash::m_enumeration_tolerance [protected]
 

Definition at line 165 of file MemHash.h.

Referenced by Clear(), EnumerateMatches(), GetEnumerationTolerance(), MemHash(), operator=(), and SetEnumerationTolerance().

uint64 mems::MemHash::m_mem_count [protected]
 

Definition at line 166 of file MemHash.h.

Referenced by AddHashEntry(), Clear(), MemCount(), MemHash(), operator=(), and WriteFile().

uint32 mems::MemHash::m_repeat_tolerance [protected]
 

Definition at line 164 of file MemHash.h.

Referenced by Clear(), EnumerateMatches(), GetRepeatTolerance(), MemHash(), operator=(), and SetRepeatTolerance().

std::ostream* mems::MemHash::match_log [protected]
 

Definition at line 170 of file MemHash.h.

Referenced by AddHashEntry(), Clear(), MemHash(), and operator=().

std::vector< std::vector<MatchHashEntry*> > mems::MemHash::mem_table [protected]
 

Definition at line 163 of file MemHash.h.

Referenced by AddHashEntry(), Clear(), GetMatchList(), MemHash(), operator=(), PrintDistribution(), SetTableSize(), and WriteFile().

std::vector<uint32> mems::MemHash::mem_table_count [protected]
 

Definition at line 168 of file MemHash.h.

Referenced by AddHashEntry(), Clear(), MemHash(), MemTableCount(), operator=(), PrintDistribution(), and SetTableSize().

MheCompare mems::MemHash::mhecomp [protected]
 

Definition at line 174 of file MemHash.h.

Referenced by AddHashEntry().

uint32 mems::MemHash::table_size [protected]
 

Definition at line 162 of file MemHash.h.

Referenced by AddHashEntry(), Clear(), GetMatchList(), MemHash(), operator=(), SetTableSize(), TableSize(), and WriteFile().


The documentation for this class was generated from the following files:
Generated on Fri Mar 14 06:01:42 2008 for libMems by doxygen 1.3.6