#include "functions.h"
#include "helper.h"
#include "db.h"
#include "threads.h"
#include <math.h>
#include <stdlib.h>
#include <string.h>
Data Structures | |
struct | squares_fill_data |
Auxiliary structure. More... | |
Functions | |
void | invalidate (cache_type *cache) |
Invalidate entry in cache. | |
void | fill_squares (let_stat *where, void *data) |
Count squares and fills appropriate field in data structure. | |
bool | fill_squares_helper (const char *string, let_stat *stats, void *data) |
void | fill_squares (let_stat *where, let_stat *other, ch_key_type type, MY_FLOAT hint) |
Count squares and fills appropriate field in data structure. | |
MY_FLOAT | lalt (const tmpstats *stats, int position) |
Number of left alternatives. | |
MY_FLOAT | ralt (const tmpstats *stats, int position) |
Number of right alternatives. | |
MY_FLOAT | trian (const tmpstats *stats, int position) |
Number of triangles. | |
MY_FLOAT | expand_other_ending (int m, int statpos, const tmpstats *stats, int buffpos, char *buff, bool changed, let_stat *next, int minlen=0) |
This really counts squares. | |
MY_FLOAT | fsqrs (const tmpstats *stats, int pos, MY_FLOAT hint, int end) |
Count number of forward squares. | |
MY_FLOAT | bsqrs (const tmpstats *stats, int pos, MY_FLOAT hint, int end) |
Count number of backward squares. | |
MY_FLOAT | sqrs (const tmpstats *stats, cache_type *cache, int position, MY_FLOAT hint, int m) |
Number of squares. | |
MY_FLOAT | sindex (const tmpstats *stats, cache_type *cache, int position) |
Suffixality index according to the Economy principle. | |
MY_FLOAT | dtrian (const tmpstats *stats, int position) |
Growth of triangles. | |
MY_FLOAT | fentr (const tmpstats *stats, cache_type *cache, int position) |
Forward entropy. | |
MY_FLOAT | bentr (const tmpstats *stats, cache_type *cache, int position) |
Backward entropy. | |
MY_FLOAT | dfentr (const tmpstats *stats, cache_type *cache, int position) |
Growth of forward entropy. | |
MY_FLOAT | dbentr (const tmpstats *stats, cache_type *cache, int position) |
Growth of backward entropy. | |
Variables | |
unsigned int | longest |
Longest word from input file. |
See header for details.
MY_FLOAT bentr | ( | const tmpstats * | stats, | |
cache_type * | cache, | |||
int | position | |||
) |
Backward entropy.
Takes statistics and calculates backward entropy. Position argument means which cut are we thinking about. It's number of letters from the beginning of the word.
stats | Statistics to use. | |
cache | Pointer to the cache structure. | |
position | Where do we want to cut the word. |
References tmpstats::backward, cache_type::bentropy, let_stat::data, and let_stat::occur.
Referenced by dbentr(), and evaluate().
MY_FLOAT bsqrs | ( | const tmpstats * | stats, | |
int | pos, | |||
MY_FLOAT | hint, | |||
int | end | |||
) |
Count number of backward squares.
This function counts number of squares sharing same ending segment. How does it differs from sqrs() function? That function count squares for some cut of the word. So it fixes both segments. This function let's beginning segment to be whatever it can be. This doesn't care about it as long as it makes square. Of course it uses triangles as heuristic. What other is different is that it stores it's cache directly to the tree of statistics.
stats | Pointers to the tree of statistics. | |
pos | Position in the word. | |
hint | How may triangles do we want? | |
end | Where is the end of statistics field. |
References fill_squares(), lock_let(), PRE_BSQUARES, unlock_let(), and VWPRINTF.
Referenced by evaluate().
MY_FLOAT dbentr | ( | const tmpstats * | stats, | |
cache_type * | cache, | |||
int | position | |||
) |
Growth of backward entropy.
Takes statistics and calculates growth of backward entropy. Position argument means which cut are we thinking about. It's number of letters from the beginning of the word.
stats | Statistics to use. | |
cache | Pointer to the cache structure. | |
position | Where do we want to cut the word. |
References bentr().
Referenced by evaluate().
MY_FLOAT dfentr | ( | const tmpstats * | stats, | |
cache_type * | cache, | |||
int | position | |||
) |
Growth of forward entropy.
Takes statistics and calculates growth of forward entropy. Position argument means which cut are we thinking about. It's number of letters from the beginning of the word.
stats | Statistics to use. | |
cache | Pointer to the cache structure. | |
position | Where do we want to cut the word. |
References fentr().
Referenced by evaluate().
MY_FLOAT dtrian | ( | const tmpstats * | stats, | |
int | position | |||
) |
Growth of triangles.
Takes statistics and returns growth of triangles. Position argument means which cut are we thinking about. It's number of letters from the beginning of the word.
stats | Statistics to use. | |
position | Where do we want to cut the word. |
References trian().
MY_FLOAT expand_other_ending | ( | int | m, | |
int | statpos, | |||
const tmpstats * | stats, | |||
int | buffpos, | |||
char * | buff, | |||
bool | changed, | |||
let_stat * | next, | |||
int | minlen = 0 | |||
) |
This really counts squares.
It will find all possible ending segments and compare trees representing it's preceding segments.
References compare_trees(), let_stat::data, match_word(), minlen, and VWPRINTF.
Referenced by sqrs().
MY_FLOAT fentr | ( | const tmpstats * | stats, | |
cache_type * | cache, | |||
int | position | |||
) |
Forward entropy.
Takes statistics and calculates backward entropy. Position argument means which cut are we thinking about. It's number of letters from the beginning of the word.
stats | Statistics to use. | |
cache | Pointer to the cache structure. | |
position | Where do we want to cut the word. |
References let_stat::data, cache_type::fentropy, tmpstats::forward, and let_stat::occur.
Referenced by dfentr(), and evaluate().
void fill_squares | ( | let_stat * | where, | |
let_stat * | other, | |||
ch_key_type | type, | |||
MY_FLOAT | hint = 1 | |||
) |
Count squares and fills appropriate field in data structure.
This function descends thought the tree of statistics counting squares. It need two arguments. First one is the beginning of the tree we want to proceed. Other one is the beginning of the backward tree.
where | Where to count squares. | |
other | Other end of tree used for backward statistics. | |
type | Howto save results. | |
hint | How many triangles use as heuristick. |
References let_stat::cached, CHECK_PTR, squares_fill_data::first_tree, for_each_segment(), squares_fill_data::other, squares_fill_data::save, squares_fill_data::type, and VWPRINTF.
void fill_squares | ( | let_stat * | where, | |
void * | data | |||
) |
Count squares and fills appropriate field in data structure.
This function descends thought the tree of statistics counting squares. It need two arguments. First one is the beginning of the tree we want to proceed. Other one is the beginning of the backward tree.
where | Where to count squares. | |
data | Other arguments packed in one structure. |
References fill_squares().
Referenced by bsqrs(), fill_squares(), and fsqrs().
MY_FLOAT fsqrs | ( | const tmpstats * | stats, | |
int | pos, | |||
MY_FLOAT | hint, | |||
int | end | |||
) |
Count number of forward squares.
This function counts number of squares sharing same beginning segment. How does it differs from sqrs() function? That function count squares for some cut of the word. So it fixes both segments. This function let's ending segment to be whatever it can be. This doesn't care about it as long as it makes square. Of course it uses triangles as heuristic. What other is different is that it stores it's cache directly to the tree of statistics.
stats | Pointers to the tree of statistics. | |
pos | Position in the word. | |
hint | How may triangles do we want? | |
end | Where is the end of statistics field. |
References fill_squares(), lock_let(), PRE_FSQUARES, unlock_let(), and VWPRINTF.
Referenced by evaluate().
void invalidate | ( | cache_type * | cache | ) |
Invalidate entry in cache.
Each method, which has some entries in cache, should add some piece of code to mark these entries as invalid. So nobody would use values from previous word.
References cache_type::bentropy, cache_type::fentropy, and cache_type::squares.
Referenced by find_it_thread().
MY_FLOAT lalt | ( | const tmpstats * | stats, | |
int | position | |||
) |
Number of left alternatives.
Takes statistics and returns left alternatives. Position argument means which cut are we thinking about. It's number of letters from the beginning of the word.
stats | Statistics to use. | |
position | Where do we want to cut the word. |
Referenced by evaluate(), and trian().
MY_FLOAT ralt | ( | const tmpstats * | stats, | |
int | position | |||
) |
Number of right alternatives.
Takes statistics and returns right alternatives. Position argument means which cut are we thinking about. It's number of letters from the beginning of the word.
stats | Statistics to use. | |
position | Where do we want to cut the word. |
Referenced by evaluate(), and trian().
MY_FLOAT sindex | ( | const tmpstats * | stats, | |
cache_type * | cache, | |||
int | position | |||
) |
Suffixality index according to the Economy principle.
Takes statistics and returns suffixality index according to the Economy principle. Position argument means which cut are we thinking about. It's number of letters from the beginning of the word.
stats | Statistics to use. | |
cache | Pointer to the cache structure. | |
position | Where do we want to cut the word. |
References cache_type::sindex.
MY_FLOAT sqrs | ( | const tmpstats * | stats, | |
cache_type * | cache, | |||
int | position, | |||
MY_FLOAT | hint, | |||
int | m | |||
) |
Number of squares.
Takes statistics and returns number of squares. Position argument means which cut are we thinking about. It's number of letters from the beginning of the word. This function uses number of triangles as a heuristic. We know that number of squares has to be less or equal to the number of triangles. So hint parameter specify lower bound for number of triangles. If less triangles then threshold is found, squares are not counted and zero is returned immediately. This can save some time, because counting squares is really time consuming.
stats | Statistics to use. | |
cache | Pointer to the cache structure. | |
position | Where do we want to cut the word. | |
hint | Number of triangles needed to count squares. | |
m | Length of the current word. | |
minlen | Minimal length of segments to appear in squares. |
References expand_other_ending(), longest, cache_type::squares, and trian().
Referenced by evaluate().
MY_FLOAT trian | ( | const tmpstats * | stats, | |
int | position | |||
) |
Number of triangles.
Takes statistics and returns number of triangles. Position argument means which cut are we thinking about. It's number of letters from the beginning of the word.
stats | Statistics to use. | |
position | Where do we want to cut the word. |
References lalt(), and ralt().
Referenced by dtrian(), evaluate(), and sqrs().
unsigned int longest |
Longest word from input file.
Longest word found.