mems::FileSML Class Reference

#include <FileSML.h>

Inheritance diagram for mems::FileSML:

Inheritance graph
[legend]
Collaboration diagram for mems::FileSML:

Collaboration graph
[legend]
List of all members.

Public Member Functions

virtual void BigCreate (const genome::gnSequence &seq, const uint32 split_levels, const uint32 mersize=DNA_MER_SIZE)
 Creates large sorted mer lists which do not fit entirely in memory.

virtual void Clear ()
 Set data structures to default values.

virtual FileSMLClone () const=0
virtual void Create (const genome::gnSequence &seq, const uint64 seed)
 Creates a new sorted mer list.

void dmCreate (const genome::gnSequence &seq, const uint64 seed)
 FileSML ()
virtual uint32 FormatVersion ()
const std::vector< int64 > & getUsedCoordinates () const
virtual void LoadFile (const std::string &fname)
 Loads an existing sorted mer list from a file on disk.

virtual void Merge (SortedMerList &sa, SortedMerList &sa2)
 Merges two SortedMerLists.

FileSMLoperator= (const FileSML &sa)
virtual bmer operator[] (gnSeqI index)
 Get the mer at the specified index in the sorted mer list.

virtual void RadixSort (std::vector< bmer > &s_array)
virtual boolean Read (std::vector< bmer > &readVector, gnSeqI size, gnSeqI offset=0)
 Read a range of mers in the sorted mer list.

virtual void SetDescription (const std::string &d)
 Sets the freeform text description of the SML.

virtual void SetID (const sarID_t d)
 Ignore this.

virtual gnSeqI UniqueMerCount ()
 Returns the number of unique mers in the sequence.


Static Public Member Functions

const char * getTempPath (int pathI)
int getTempPathCount ()
uint64 MemoryMinimum ()
void registerTempPath (const std::string &tmp_path)

Protected Member Functions

smlSeqI_tbase ()
virtual uint64 GetNeededMemory (gnSeqI len)=0
 Calculates and returns the amount of memory needed to create a sorted mer list for a sequence of the specified length.

virtual void OpenForWriting (boolean truncate=false)
 Reopens the sarfile fstream in read/write mode
Exceptions:
FileNotOpened thrown if the file could not be opened for writing.


virtual boolean WriteHeader ()
 Writes the SML header to disk
Exceptions:
FileNotOpened thrown if the file could not be opened for writing
IOStreamFailed thrown if an error occurred writing the data.



Protected Attributes

std::string filename
boost::iostreams::mapped_file_source sardata
std::fstream sarfile
uint64 sarray_start_offset
std::vector< int64 > seq_coords
 If Ns are masked, contains coordinates of regions without Ns.


Static Protected Attributes

char ** tmp_paths = NULL
 paths to scratch disk space that can be used for an external sort


Constructor & Destructor Documentation

mems::FileSML::FileSML  )  [inline]
 

Definition at line 37 of file FileSML.h.


Member Function Documentation

smlSeqI_t* mems::FileSML::base  )  [inline, protected]
 

Definition at line 111 of file FileSML.h.

References sardata, sarray_start_offset, and mems::smlSeqI_t.

void mems::FileSML::BigCreate const genome::gnSequence &  seq,
const uint32  split_levels,
const uint32  mersize = DNA_MER_SIZE
[virtual]
 

Creates large sorted mer lists which do not fit entirely in memory.

BigCreate uses an external mergesort to create large sorted mer lists. It will divide the data a number of times specified by the split_levels parameter. Each split is written to temp files on disk and merged.

Parameters:
seq The sequence to create an SML for.
split_levels The number of times to divide the sequence in half.
mersize The size of the mers to sort on.
See also:
FileSML::Create

Definition at line 1721 of file FileSML.cpp.

void mems::FileSML::Clear  )  [virtual]
 

Set data structures to default values.

Reimplemented from mems::SortedMerList.

Definition at line 38 of file FileSML.cpp.

References filename, sarfile, sarray_start_offset, and seq_coords.

virtual FileSML* mems::FileSML::Clone  )  const [pure virtual]
 

Implemented in mems::DNAFileSML.

void mems::FileSML::Create const genome::gnSequence &  seq,
const uint64  seed
[virtual]
 

Creates a new sorted mer list.

This function enumerates each possible mer of the specified size and sorts them alphabetically in order to construct a sorted mer list.

Parameters:
seq The sequence to create an SML for.
mersize The size of the mers to sort on.

Reimplemented from mems::SortedMerList.

Definition at line 1620 of file FileSML.cpp.

void mems::FileSML::dmCreate const genome::gnSequence &  seq,
const uint64  seed
 

Definition at line 1582 of file FileSML.cpp.

Referenced by mems::SearchLCBGaps().

uint32 mems::FileSML::FormatVersion  )  [inline, virtual]
 

Reimplemented in mems::DNAFileSML.

Definition at line 120 of file FileSML.h.

References mems::uint32.

Referenced by LoadFile().

virtual uint64 mems::FileSML::GetNeededMemory gnSeqI  len  )  [protected, pure virtual]
 

Calculates and returns the amount of memory needed to create a sorted mer list for a sequence of the specified length.

Parameters:
len The length of the sequence
Returns:
The amount of memory needed in bytes.

Implemented in mems::DNAFileSML.

const char * mems::FileSML::getTempPath int  pathI  )  [static]
 

Definition at line 1524 of file FileSML.cpp.

int mems::FileSML::getTempPathCount  )  [static]
 

Definition at line 1528 of file FileSML.cpp.

const std::vector< int64 >& mems::FileSML::getUsedCoordinates  )  const [inline]
 

Definition at line 84 of file FileSML.h.

References seq_coords.

void mems::FileSML::LoadFile const std::string &  fname  )  [virtual]
 

Loads an existing sorted mer list from a file on disk.

Parameters:
fname The name of the file to load
Exceptions:
FileNotOpened thrown if the file could not be opened
FileUnreadable thrown if the file was corrupt or not a sorted mer list

Definition at line 46 of file FileSML.cpp.

References mems::SMLHeader::alphabet_bits, mems::SMLHeader::circular, filename, FormatVersion(), mems::SMLHeader::length, sardata, sarfile, sarray_start_offset, mems::SMLHeader::seed_length, mems::SortedMerList::seed_mask, mems::SMLHeader::seed_weight, seq_coords, mems::SortedMerList::SetMerMaskSize(), mems::uint32, and mems::uint64.

Referenced by mems::RepeatMatchList::LoadSMLs(), and mems::GenericMatchList< MatchPtrType >::LoadSMLs().

uint64 mems::FileSML::MemoryMinimum  )  [inline, static]
 

Definition at line 126 of file FileSML.h.

References DEFAULT_MEMORY_MINIMUM, mems::uint32, and mems::uint64.

void mems::FileSML::Merge SortedMerList sa,
SortedMerList sa2
[virtual]
 

Merges two SortedMerLists.

Implements mems::SortedMerList.

Definition at line 1812 of file FileSML.cpp.

void mems::FileSML::OpenForWriting boolean  truncate = false  )  [protected, virtual]
 

Reopens the sarfile fstream in read/write mode

Exceptions:
FileNotOpened thrown if the file could not be opened for writing.

Definition at line 112 of file FileSML.cpp.

References filename, and sarfile.

Referenced by WriteHeader().

FileSML & mems::FileSML::operator= const FileSML sa  ) 
 

Definition at line 23 of file FileSML.cpp.

References filename, sarfile, sarray_start_offset, and seq_coords.

bmer mems::FileSML::operator[] gnSeqI  index  )  [virtual]
 

Get the mer at the specified index in the sorted mer list.

Parameters:
index The index of the mer to return.
Returns:
The specified mer.

Implements mems::SortedMerList.

Definition at line 1680 of file FileSML.cpp.

void mems::FileSML::RadixSort std::vector< bmer > &  s_array  )  [virtual]
 

Definition at line 1761 of file FileSML.cpp.

boolean mems::FileSML::Read std::vector< bmer > &  readVector,
gnSeqI  size,
gnSeqI  offset = 0
[virtual]
 

Read a range of mers in the sorted mer list.

This function reads a section of data from the sorted mer list starting at 'offset' and continuing for 'size' mers. The mers are placed into readVector. Anything already in readVector is cleared. Returns false if there was a problem completing the read. If the end of the list is reached, all mers which could be read will be placed into readVector and false will be returned

Parameters:
readVector the vector to read bmers into.
size The number of bmers to read.
offset The mer index in the sorted mer list to start reading from.
Returns:
false if a problem was encountered while reading.

Implements mems::SortedMerList.

Definition at line 1689 of file FileSML.cpp.

void mems::FileSML::registerTempPath const std::string &  tmp_path  )  [static]
 

Definition at line 1485 of file FileSML.cpp.

void mems::FileSML::SetDescription const std::string &  d  )  [virtual]
 

Sets the freeform text description of the SML.

Reimplemented from mems::SortedMerList.

Definition at line 164 of file FileSML.cpp.

References mems::SMLHeader::description, DESCRIPTION_SIZE, and WriteHeader().

void mems::FileSML::SetID const sarID_t  d  )  [virtual]
 

Ignore this.

Reimplemented from mems::SortedMerList.

Definition at line 169 of file FileSML.cpp.

References mems::SMLHeader::id, mems::sarID_t, and WriteHeader().

gnSeqI mems::FileSML::UniqueMerCount  )  [virtual]
 

Returns the number of unique mers in the sequence.

Reimplemented from mems::SortedMerList.

Definition at line 155 of file FileSML.cpp.

References mems::SMLHeader::unique_mers, and WriteHeader().

boolean mems::FileSML::WriteHeader  )  [protected, virtual]
 

Writes the SML header to disk

Exceptions:
FileNotOpened thrown if the file could not be opened for writing
IOStreamFailed thrown if an error occurred writing the data.

Definition at line 129 of file FileSML.cpp.

References filename, OpenForWriting(), and sarfile.

Referenced by SetDescription(), SetID(), and UniqueMerCount().


Member Data Documentation

std::string mems::FileSML::filename [protected]
 

Definition at line 106 of file FileSML.h.

Referenced by Clear(), LoadFile(), OpenForWriting(), operator=(), and WriteHeader().

boost::iostreams::mapped_file_source mems::FileSML::sardata [protected]
 

Definition at line 110 of file FileSML.h.

Referenced by base(), and LoadFile().

std::fstream mems::FileSML::sarfile [protected]
 

Definition at line 107 of file FileSML.h.

Referenced by Clear(), LoadFile(), OpenForWriting(), operator=(), and WriteHeader().

uint64 mems::FileSML::sarray_start_offset [protected]
 

Definition at line 108 of file FileSML.h.

Referenced by base(), Clear(), LoadFile(), and operator=().

std::vector< int64 > mems::FileSML::seq_coords [protected]
 

If Ns are masked, contains coordinates of regions without Ns.

Definition at line 114 of file FileSML.h.

Referenced by Clear(), getUsedCoordinates(), LoadFile(), and operator=().

char ** mems::FileSML::tmp_paths = NULL [static, protected]
 

paths to scratch disk space that can be used for an external sort

Definition at line 1483 of file FileSML.cpp.


The documentation for this class was generated from the following files:
Generated on Fri Mar 14 06:01:40 2008 for libMems by doxygen 1.3.6