functions.h File Reference

Module with methods definition. More...

#include "db.h"

Include dependency graph for functions.h:

This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Data Structures

struct  cache_type
 Cache to speedup computing. More...
struct  fill_squares_struct
 This structure is use to provide parameters for fill_squares() function. More...

Functions

void invalidate (cache_type *cache)
 Invalidate entry in cache.
void fill_squares (let_stat *where, let_stat *other, ch_key_type type, MY_FLOAT hint=1)
 Count squares and fills appropriate field in data structure.
void fill_squares (let_stat *where, void *data)
 Count squares and fills appropriate field in data structure.
MY_FLOAT lalt (const tmpstats *stats, int position)
 Number of left alternatives.
MY_FLOAT ralt (const tmpstats *stats, int position)
 Number of right alternatives.
MY_FLOAT trian (const tmpstats *stats, int position)
 Number of triangles.
MY_FLOAT sqrs (const tmpstats *stats, cache_type *cache, int position, MY_FLOAT hint, int m)
 Number of squares.
MY_FLOAT fsqrs (const tmpstats *stats, int pos, MY_FLOAT hint, int end)
 Count number of forward squares.
MY_FLOAT bsqrs (const tmpstats *stats, int pos, MY_FLOAT hint, int end)
 Count number of backward squares.
MY_FLOAT sindex (const tmpstats *stats, cache_type *cache, int position)
 Suffixality index according to the Economy principle.
MY_FLOAT dtrian (const tmpstats *stats, int position)
 Growth of triangles.
MY_FLOAT fentr (const tmpstats *stats, cache_type *cache, int position)
 Forward entropy.
MY_FLOAT bentr (const tmpstats *stats, cache_type *cache, int position)
 Backward entropy.
MY_FLOAT dfentr (const tmpstats *stats, cache_type *cache, int position)
 Growth of forward entropy.
MY_FLOAT dbentr (const tmpstats *stats, cache_type *cache, int position)
 Growth of backward entropy.


Detailed Description

Module with methods definition.

This file contains functions representing various methods for prefix recognition. It's used by evaluator. Each method should take as an arguments current pointer to the structure tmpstats cause it's only way, how to get some statistics. And each function has too return float with result.


Function Documentation

MY_FLOAT bentr ( const tmpstats stats,
cache_type cache,
int  position 
)

Backward entropy.

Takes statistics and calculates backward entropy. Position argument means which cut are we thinking about. It's number of letters from the beginning of the word.

Parameters:
stats Statistics to use.
cache Pointer to the cache structure.
position Where do we want to cut the word.
Returns:
Backward entropy.

References tmpstats::backward, cache_type::bentropy, let_stat::data, and let_stat::occur.

Referenced by dbentr(), and evaluate().

MY_FLOAT bsqrs ( const tmpstats stats,
int  pos,
MY_FLOAT  hint,
int  end 
)

Count number of backward squares.

This function counts number of squares sharing same ending segment. How does it differs from sqrs() function? That function count squares for some cut of the word. So it fixes both segments. This function let's beginning segment to be whatever it can be. This doesn't care about it as long as it makes square. Of course it uses triangles as heuristic. What other is different is that it stores it's cache directly to the tree of statistics.

Parameters:
stats Pointers to the tree of statistics.
pos Position in the word.
hint How may triangles do we want?
end Where is the end of statistics field.
See also:
sqrs(), fsqrs()

References fill_squares(), lock_let(), PRE_BSQUARES, unlock_let(), and VWPRINTF.

Referenced by evaluate().

MY_FLOAT dbentr ( const tmpstats stats,
cache_type cache,
int  position 
)

Growth of backward entropy.

Takes statistics and calculates growth of backward entropy. Position argument means which cut are we thinking about. It's number of letters from the beginning of the word.

Parameters:
stats Statistics to use.
cache Pointer to the cache structure.
position Where do we want to cut the word.
Returns:
Growth of backward entropy.

References bentr().

Referenced by evaluate().

MY_FLOAT dfentr ( const tmpstats stats,
cache_type cache,
int  position 
)

Growth of forward entropy.

Takes statistics and calculates growth of forward entropy. Position argument means which cut are we thinking about. It's number of letters from the beginning of the word.

Parameters:
stats Statistics to use.
cache Pointer to the cache structure.
position Where do we want to cut the word.
Returns:
Growth of forward entropy.

References fentr().

Referenced by evaluate().

MY_FLOAT dtrian ( const tmpstats stats,
int  position 
)

Growth of triangles.

Takes statistics and returns growth of triangles. Position argument means which cut are we thinking about. It's number of letters from the beginning of the word.

Parameters:
stats Statistics to use.
position Where do we want to cut the word.
Returns:
Growth of triangles.

References trian().

MY_FLOAT fentr ( const tmpstats stats,
cache_type cache,
int  position 
)

Forward entropy.

Takes statistics and calculates backward entropy. Position argument means which cut are we thinking about. It's number of letters from the beginning of the word.

Parameters:
stats Statistics to use.
cache Pointer to the cache structure.
position Where do we want to cut the word.
Returns:
Forward entropy.

References let_stat::data, cache_type::fentropy, tmpstats::forward, and let_stat::occur.

Referenced by dfentr(), and evaluate().

void fill_squares ( let_stat where,
void *  data 
)

Count squares and fills appropriate field in data structure.

This function descends thought the tree of statistics counting squares. It need two arguments. First one is the beginning of the tree we want to proceed. Other one is the beginning of the backward tree.

Parameters:
where Where to count squares.
data Other arguments packed in one structure.
See also:
fill_squares(), fill_squares_struct

References fill_squares().

Referenced by bsqrs(), fill_squares(), and fsqrs().

void fill_squares ( let_stat where,
let_stat other,
ch_key_type  type,
MY_FLOAT  hint = 1 
)

Count squares and fills appropriate field in data structure.

This function descends thought the tree of statistics counting squares. It need two arguments. First one is the beginning of the tree we want to proceed. Other one is the beginning of the backward tree.

Parameters:
where Where to count squares.
other Other end of tree used for backward statistics.
type Howto save results.
hint How many triangles use as heuristick.

References let_stat::cached, CHECK_PTR, squares_fill_data::first_tree, for_each_segment(), squares_fill_data::other, squares_fill_data::save, squares_fill_data::type, and VWPRINTF.

MY_FLOAT fsqrs ( const tmpstats stats,
int  pos,
MY_FLOAT  hint,
int  end 
)

Count number of forward squares.

This function counts number of squares sharing same beginning segment. How does it differs from sqrs() function? That function count squares for some cut of the word. So it fixes both segments. This function let's ending segment to be whatever it can be. This doesn't care about it as long as it makes square. Of course it uses triangles as heuristic. What other is different is that it stores it's cache directly to the tree of statistics.

Parameters:
stats Pointers to the tree of statistics.
pos Position in the word.
hint How may triangles do we want?
end Where is the end of statistics field.
See also:
sqrs(), bsqrs()

References fill_squares(), lock_let(), PRE_FSQUARES, unlock_let(), and VWPRINTF.

Referenced by evaluate().

void invalidate ( cache_type cache  ) 

Invalidate entry in cache.

Each method, which has some entries in cache, should add some piece of code to mark these entries as invalid. So nobody would use values from previous word.

References cache_type::bentropy, cache_type::fentropy, and cache_type::squares.

Referenced by find_it_thread().

MY_FLOAT lalt ( const tmpstats stats,
int  position 
)

Number of left alternatives.

Takes statistics and returns left alternatives. Position argument means which cut are we thinking about. It's number of letters from the beginning of the word.

Parameters:
stats Statistics to use.
position Where do we want to cut the word.
Returns:
Number of beginning segments which can end in this way.

Referenced by evaluate(), and trian().

MY_FLOAT ralt ( const tmpstats stats,
int  position 
)

Number of right alternatives.

Takes statistics and returns right alternatives. Position argument means which cut are we thinking about. It's number of letters from the beginning of the word.

Parameters:
stats Statistics to use.
position Where do we want to cut the word.
Returns:
Number of ending segments which can end this segment.

Referenced by evaluate(), and trian().

MY_FLOAT sindex ( const tmpstats stats,
cache_type cache,
int  position 
)

Suffixality index according to the Economy principle.

Takes statistics and returns suffixality index according to the Economy principle. Position argument means which cut are we thinking about. It's number of letters from the beginning of the word.

Parameters:
stats Statistics to use.
cache Pointer to the cache structure.
position Where do we want to cut the word.
Returns:
Suffixality index.

References cache_type::sindex.

MY_FLOAT sqrs ( const tmpstats stats,
cache_type cache,
int  position,
MY_FLOAT  hint,
int  m 
)

Number of squares.

Takes statistics and returns number of squares. Position argument means which cut are we thinking about. It's number of letters from the beginning of the word. This function uses number of triangles as a heuristic. We know that number of squares has to be less or equal to the number of triangles. So hint parameter specify lower bound for number of triangles. If less triangles then threshold is found, squares are not counted and zero is returned immediately. This can save some time, because counting squares is really time consuming.

Parameters:
stats Statistics to use.
cache Pointer to the cache structure.
position Where do we want to cut the word.
hint Number of triangles needed to count squares.
m Length of the current word.
minlen Minimal length of segments to appear in squares.
Returns:
Number of squares.
See also:
trian(), fsqrs(), bsqrs()

References expand_other_ending(), longest, cache_type::squares, and trian().

Referenced by evaluate().

MY_FLOAT trian ( const tmpstats stats,
int  position 
)

Number of triangles.

Takes statistics and returns number of triangles. Position argument means which cut are we thinking about. It's number of letters from the beginning of the word.

Parameters:
stats Statistics to use.
position Where do we want to cut the word.
Returns:
Number of triangles.

References lalt(), and ralt().

Referenced by dtrian(), evaluate(), and sqrs().


Generated on Sat Dec 20 11:32:18 2008 for Affisix by  doxygen 1.5.7.1