functions.cpp File Reference

Implementation of methods. More...

#include "functions.h"
#include "helper.h"
#include "db.h"
#include "threads.h"
#include <math.h>
#include <stdlib.h>
#include <string.h>

Include dependency graph for functions.cpp:


Data Structures

struct  squares_fill_data
 Auxiliary structure. More...

Functions

void invalidate (cache_type *cache)
 Invalidate entry in cache.
void fill_squares (let_stat *where, void *data)
 Count squares and fills appropriate field in data structure.
bool fill_squares_helper (const char *string, let_stat *stats, void *data)
void fill_squares (let_stat *where, let_stat *other, ch_key_type type, MY_FLOAT hint)
 Count squares and fills appropriate field in data structure.
MY_FLOAT lalt (const tmpstats *stats, int position)
 Number of left alternatives.
MY_FLOAT ralt (const tmpstats *stats, int position)
 Number of right alternatives.
MY_FLOAT trian (const tmpstats *stats, int position)
 Number of triangles.
MY_FLOAT expand_other_ending (int m, int statpos, const tmpstats *stats, int buffpos, char *buff, bool changed, let_stat *next, int minlen=0)
 This really counts squares.
MY_FLOAT fsqrs (const tmpstats *stats, int pos, MY_FLOAT hint, int end)
 Count number of forward squares.
MY_FLOAT bsqrs (const tmpstats *stats, int pos, MY_FLOAT hint, int end)
 Count number of backward squares.
MY_FLOAT sqrs (const tmpstats *stats, cache_type *cache, int position, MY_FLOAT hint, int m)
 Number of squares.
MY_FLOAT sindex (const tmpstats *stats, cache_type *cache, int position)
 Suffixality index according to the Economy principle.
MY_FLOAT dtrian (const tmpstats *stats, int position)
 Growth of triangles.
MY_FLOAT fentr (const tmpstats *stats, cache_type *cache, int position)
 Forward entropy.
MY_FLOAT bentr (const tmpstats *stats, cache_type *cache, int position)
 Backward entropy.
MY_FLOAT dfentr (const tmpstats *stats, cache_type *cache, int position)
 Growth of forward entropy.
MY_FLOAT dbentr (const tmpstats *stats, cache_type *cache, int position)
 Growth of backward entropy.

Variables

unsigned int longest
 Longest word from input file.

Detailed Description

Implementation of methods.

See header for details.


Function Documentation

MY_FLOAT bentr ( const tmpstats stats,
cache_type cache,
int  position 
)

Backward entropy.

Takes statistics and calculates backward entropy. Position argument means which cut are we thinking about. It's number of letters from the beginning of the word.

Parameters:
stats Statistics to use.
cache Pointer to the cache structure.
position Where do we want to cut the word.
Returns:
Backward entropy.

References tmpstats::backward, cache_type::bentropy, let_stat::data, and let_stat::occur.

Referenced by dbentr(), and evaluate().

MY_FLOAT bsqrs ( const tmpstats stats,
int  pos,
MY_FLOAT  hint,
int  end 
)

Count number of backward squares.

This function counts number of squares sharing same ending segment. How does it differs from sqrs() function? That function count squares for some cut of the word. So it fixes both segments. This function let's beginning segment to be whatever it can be. This doesn't care about it as long as it makes square. Of course it uses triangles as heuristic. What other is different is that it stores it's cache directly to the tree of statistics.

Parameters:
stats Pointers to the tree of statistics.
pos Position in the word.
hint How may triangles do we want?
end Where is the end of statistics field.
See also:
sqrs(), fsqrs()

References fill_squares(), lock_let(), PRE_BSQUARES, unlock_let(), and VWPRINTF.

Referenced by evaluate().

MY_FLOAT dbentr ( const tmpstats stats,
cache_type cache,
int  position 
)

Growth of backward entropy.

Takes statistics and calculates growth of backward entropy. Position argument means which cut are we thinking about. It's number of letters from the beginning of the word.

Parameters:
stats Statistics to use.
cache Pointer to the cache structure.
position Where do we want to cut the word.
Returns:
Growth of backward entropy.

References bentr().

Referenced by evaluate().

MY_FLOAT dfentr ( const tmpstats stats,
cache_type cache,
int  position 
)

Growth of forward entropy.

Takes statistics and calculates growth of forward entropy. Position argument means which cut are we thinking about. It's number of letters from the beginning of the word.

Parameters:
stats Statistics to use.
cache Pointer to the cache structure.
position Where do we want to cut the word.
Returns:
Growth of forward entropy.

References fentr().

Referenced by evaluate().

MY_FLOAT dtrian ( const tmpstats stats,
int  position 
)

Growth of triangles.

Takes statistics and returns growth of triangles. Position argument means which cut are we thinking about. It's number of letters from the beginning of the word.

Parameters:
stats Statistics to use.
position Where do we want to cut the word.
Returns:
Growth of triangles.

References trian().

MY_FLOAT expand_other_ending ( int  m,
int  statpos,
const tmpstats stats,
int  buffpos,
char *  buff,
bool  changed,
let_stat next,
int  minlen = 0 
)

This really counts squares.

It will find all possible ending segments and compare trees representing it's preceding segments.

References compare_trees(), let_stat::data, match_word(), minlen, and VWPRINTF.

Referenced by sqrs().

MY_FLOAT fentr ( const tmpstats stats,
cache_type cache,
int  position 
)

Forward entropy.

Takes statistics and calculates backward entropy. Position argument means which cut are we thinking about. It's number of letters from the beginning of the word.

Parameters:
stats Statistics to use.
cache Pointer to the cache structure.
position Where do we want to cut the word.
Returns:
Forward entropy.

References let_stat::data, cache_type::fentropy, tmpstats::forward, and let_stat::occur.

Referenced by dfentr(), and evaluate().

void fill_squares ( let_stat where,
let_stat other,
ch_key_type  type,
MY_FLOAT  hint = 1 
)

Count squares and fills appropriate field in data structure.

This function descends thought the tree of statistics counting squares. It need two arguments. First one is the beginning of the tree we want to proceed. Other one is the beginning of the backward tree.

Parameters:
where Where to count squares.
other Other end of tree used for backward statistics.
type Howto save results.
hint How many triangles use as heuristick.

References let_stat::cached, CHECK_PTR, squares_fill_data::first_tree, for_each_segment(), squares_fill_data::other, squares_fill_data::save, squares_fill_data::type, and VWPRINTF.

void fill_squares ( let_stat where,
void *  data 
)

Count squares and fills appropriate field in data structure.

This function descends thought the tree of statistics counting squares. It need two arguments. First one is the beginning of the tree we want to proceed. Other one is the beginning of the backward tree.

Parameters:
where Where to count squares.
data Other arguments packed in one structure.
See also:
fill_squares(), fill_squares_struct

References fill_squares().

Referenced by bsqrs(), fill_squares(), and fsqrs().

MY_FLOAT fsqrs ( const tmpstats stats,
int  pos,
MY_FLOAT  hint,
int  end 
)

Count number of forward squares.

This function counts number of squares sharing same beginning segment. How does it differs from sqrs() function? That function count squares for some cut of the word. So it fixes both segments. This function let's ending segment to be whatever it can be. This doesn't care about it as long as it makes square. Of course it uses triangles as heuristic. What other is different is that it stores it's cache directly to the tree of statistics.

Parameters:
stats Pointers to the tree of statistics.
pos Position in the word.
hint How may triangles do we want?
end Where is the end of statistics field.
See also:
sqrs(), bsqrs()

References fill_squares(), lock_let(), PRE_FSQUARES, unlock_let(), and VWPRINTF.

Referenced by evaluate().

void invalidate ( cache_type cache  ) 

Invalidate entry in cache.

Each method, which has some entries in cache, should add some piece of code to mark these entries as invalid. So nobody would use values from previous word.

References cache_type::bentropy, cache_type::fentropy, and cache_type::squares.

Referenced by find_it_thread().

MY_FLOAT lalt ( const tmpstats stats,
int  position 
)

Number of left alternatives.

Takes statistics and returns left alternatives. Position argument means which cut are we thinking about. It's number of letters from the beginning of the word.

Parameters:
stats Statistics to use.
position Where do we want to cut the word.
Returns:
Number of beginning segments which can end in this way.

Referenced by evaluate(), and trian().

MY_FLOAT ralt ( const tmpstats stats,
int  position 
)

Number of right alternatives.

Takes statistics and returns right alternatives. Position argument means which cut are we thinking about. It's number of letters from the beginning of the word.

Parameters:
stats Statistics to use.
position Where do we want to cut the word.
Returns:
Number of ending segments which can end this segment.

Referenced by evaluate(), and trian().

MY_FLOAT sindex ( const tmpstats stats,
cache_type cache,
int  position 
)

Suffixality index according to the Economy principle.

Takes statistics and returns suffixality index according to the Economy principle. Position argument means which cut are we thinking about. It's number of letters from the beginning of the word.

Parameters:
stats Statistics to use.
cache Pointer to the cache structure.
position Where do we want to cut the word.
Returns:
Suffixality index.

References cache_type::sindex.

MY_FLOAT sqrs ( const tmpstats stats,
cache_type cache,
int  position,
MY_FLOAT  hint,
int  m 
)

Number of squares.

Takes statistics and returns number of squares. Position argument means which cut are we thinking about. It's number of letters from the beginning of the word. This function uses number of triangles as a heuristic. We know that number of squares has to be less or equal to the number of triangles. So hint parameter specify lower bound for number of triangles. If less triangles then threshold is found, squares are not counted and zero is returned immediately. This can save some time, because counting squares is really time consuming.

Parameters:
stats Statistics to use.
cache Pointer to the cache structure.
position Where do we want to cut the word.
hint Number of triangles needed to count squares.
m Length of the current word.
minlen Minimal length of segments to appear in squares.
Returns:
Number of squares.
See also:
trian(), fsqrs(), bsqrs()

References expand_other_ending(), longest, cache_type::squares, and trian().

Referenced by evaluate().

MY_FLOAT trian ( const tmpstats stats,
int  position 
)

Number of triangles.

Takes statistics and returns number of triangles. Position argument means which cut are we thinking about. It's number of letters from the beginning of the word.

Parameters:
stats Statistics to use.
position Where do we want to cut the word.
Returns:
Number of triangles.

References lalt(), and ralt().

Referenced by dtrian(), evaluate(), and sqrs().


Variable Documentation

unsigned int longest

Longest word from input file.

Longest word found.


Generated on Sat Dec 20 11:32:18 2008 for Affisix by  doxygen 1.5.7.1