#include <map>
#include "config.h"
#include "helper.h"
Go to the source code of this file.
Data Structures | |
struct | tmpstats |
New data structure for statistics. More... | |
struct | let_stat |
Statistics about letter. More... | |
struct | db |
Database itself. More... | |
Typedefs | |
typedef u_int8_t | ch_key_type |
Datatype for key used to save cached statistics. | |
typedef std::map< ch_key_type, void * > | cached_data |
List with cached data. | |
typedef std::map< CHAR, let_stat * > | followers |
List of followers. | |
Enumerations | |
enum | cache_keys { PRE_FSQUARES = 1, PRE_BSQUARES } |
Enum with possible keys to cached statistics map. More... | |
Functions | |
db * | init_db () |
Initialization of database. | |
void | compute_stats (db *stats, char *from) |
Computes statistics. | |
void | destroy_db (db *what) |
Free database. | |
bool | for_each_segment (let_stat *where, const char *begin, const char *end, bool(*function)(const char *, let_stat *, void *), void *data) |
Runs specific function for each word from selected tree. | |
void | for_each_thread_node (let_stat *where, void(*function)(let_stat *, void *), void *data) |
Runs specific function for each node in selected tree. | |
MY_FLOAT | compare_trees (let_stat *first, let_stat *second, int mindepth=0) |
This function compares trees. | |
let_stat * | match_word (let_stat *from, const char *word, bool reverse=true) |
This function finds selected word in the tree. |
For computing and maintaining statistics are used two trees. On is constructed from the beginning of the word and one from the ending. For each letter it contains all possible following letters and number of their occurrences. Trees begins with empty word in root, so you can get number of word in whole file simply by reading db.forward.occur. Other important feature is that ending words are also marked in trees. 0 is inserted at the end of every word.
enum cache_keys |
This function compares trees.
It descends trough the trees and compares members. It returns number of same words. If you specify optional argument, it returns only same words longer then specified value.
References compare_trees(), let_stat::data, and let_stat::occur.
Referenced by compare_trees(), and expand_other_ending().
void compute_stats | ( | db * | stats, | |
char * | from | |||
) |
Computes statistics.
This fills database with statistics from selected file. Computes all occurences and also computes prefind ststistics acording to the preconputemask (global variable) mask.
stats | Database to fill. | |
from | Input file. |
References db::backward, let_stat::data, db::forward, getword(), longest, let_stat::occur, VPRINTF, VWPRINTF, and WPRINTF.
Referenced by main().
bool for_each_segment | ( | let_stat * | where, | |
const char * | begin, | |||
const char * | end, | |||
bool(*)(const char *, let_stat *, void *) | function, | |||
void * | data | |||
) |
Runs specific function for each word from selected tree.
This function takes as an argument two words (represented as string) and runs specified function for each word greater then first one and less then second one which is present in the provided tree. You can use NULL to omit one of the bounds. Data can be exchanged via object (void*) which is also argument of this function.
Function which is the main argument should take as an argument string, which represents current word and as a second argument pointer to the statistics for this word (pointer to the last statistic element). Last argument represents our data (provided to for_each * function). Function should return boolean value. Once it returns false, iterating will stop.
where | Which tree we want to proceed. | |
begin | Lower bound of the range. | |
end | Upper bound of the range. | |
function | What do we want to do. | |
data | Data object passed to the function. |
References for_each_segment_helper(), and longest.
Referenced by fill_squares().
Runs specific function for each node in selected tree.
This function takes as an argument root of the tree in the database and runs specified function for each member of this tree. Data can be exchanged via object (void*) which is also argument of this function. This function runs several threads. So be careful. You are responsible for making you function thread safe ;-) To do so easily, let me tell you, that none of you functions would be called for the same node twice. Other thing you have to pay attention is that you should wait for all childes to exit after calling this function. This feature not a bug. This is because you may want to run something else to keep your processor busy ;-) For example you may want to run same function with different argument.
Function which is the main argument should take as an argument pointer for the current node Second argument represents our data (provided to for_each function).
where | Which tree we want to proceed. | |
function | What do we want to do. | |
data | Data object passed to the function. |
References CHECK_PTR, let_stat::data, for_each_thread_node_structure::data, for_each_thread_node(), for_each_thread_node_helper(), for_each_thread_node_structure::function, and for_each_thread_node_structure::where.
Referenced by for_each_thread_node().
This function finds selected word in the tree.
It descend through the tree and tries to match provided word. If word is not present, it returns NULL, else it returns pointer to the last let_stat*
from | Where to begin with searching. | |
word | What we are looking for. | |
revers | Start matching from the end of the string. |
References let_stat::data.
Referenced by expand_other_ending().