tesseract::TessdataManager Class Reference

#include <tessdatamanager.h>

List of all members.

Public Member Functions

 TessdataManager ()
 ~TessdataManager ()
void Init (const char *data_file_name)
FILE * GetDataFilePtr () const
bool SeekToStart (TessdataType tessdata_type)
inT64 GetEndOffset (TessdataType tessdata_type) const
void End ()
bool OverwriteComponents (const char *new_traineddata_filename, char **component_filenames, int num_new_components)
bool ExtractToFile (const char *filename)

Static Public Member Functions

static void WriteMetadata (inT64 *offset_table, FILE *output_file)
static bool CombineDataFiles (const char *language_data_path_prefix, const char *output_filename)
static void CopyFile (FILE *input_file, FILE *output_file, bool newline_end, inT64 num_bytes_to_copy)
static bool TessdataTypeFromFileSuffix (const char *suffix, TessdataType *type, bool *text_file)
static bool TessdataTypeFromFileName (const char *filename, TessdataType *type, bool *text_file)

Constructor & Destructor Documentation

tesseract::TessdataManager::TessdataManager (  )  [inline]
tesseract::TessdataManager::~TessdataManager (  )  [inline]

Member Function Documentation

bool tesseract::TessdataManager::CombineDataFiles ( const char *  language_data_path_prefix,
const char *  output_filename 
) [static]

Reads all the standard tesseract config and data files for a language at the given path and bundles them up into one binary data file. Returns true if the combined traineddata file was successfully written.

void tesseract::TessdataManager::CopyFile ( FILE *  input_file,
FILE *  output_file,
bool  newline_end,
inT64  num_bytes_to_copy 
) [static]

Copies data from the given input file to the output_file provided. If num_bytes_to_copy is >= 0, only num_bytes_to_copy is copied from the input file, otherwise all the data in the input file is copied.

void tesseract::TessdataManager::End (  )  [inline]

Closes data_file_ (if it was opened by Init()).

bool tesseract::TessdataManager::ExtractToFile ( const char *  filename  ) 

Extracts tessdata component implied by the name of the input file from the combined traineddata loaded into TessdataManager. Writes the extracted component to the file indicated by the file name. E.g. if the filename given is somepath/somelang.unicharset, unicharset will be extracted from the data loaded into the TessdataManager and will be written to somepath/somelang.unicharset.

Returns:
true if the component was successfully extracted, false if the component was not present in the traineddata loaded into TessdataManager.
FILE* tesseract::TessdataManager::GetDataFilePtr (  )  const [inline]

Returns data file pointer.

inT64 tesseract::TessdataManager::GetEndOffset ( TessdataType  tessdata_type  )  const [inline]

Returns the end offset for the given tesseract data file type.

void tesseract::TessdataManager::Init ( const char *  data_file_name  ) 

Opens the given data file and reads the offset table.

bool tesseract::TessdataManager::OverwriteComponents ( const char *  new_traineddata_filename,
char **  component_filenames,
int  num_new_components 
)

Gets the individual components from the data_file_ with which the class was initialized. Overwrites the components specified by component_filenames. Writes the updated traineddata file to new_traineddata_filename.

bool tesseract::TessdataManager::SeekToStart ( TessdataType  tessdata_type  )  [inline]

Returns false if there is no data of the given type. Otherwise does a seek on the data_file_ to position the pointer at the start of the data of the given type.

bool tesseract::TessdataManager::TessdataTypeFromFileName ( const char *  filename,
TessdataType type,
bool text_file 
) [static]

Tries to determine tessdata component file suffix from filename, returns true on success.

bool tesseract::TessdataManager::TessdataTypeFromFileSuffix ( const char *  suffix,
TessdataType type,
bool text_file 
) [static]

Fills type with TessdataType of the tessdata component represented by the given file name. E.g. tessdata/eng.unicharset -> TESSDATA_UNICHARSET. Sets *text_file to true if the component is in text format (e.g. unicharset, unichar ambigs, config, etc).

Returns:
true if the tessdata component type could be determined from the given file name.
void tesseract::TessdataManager::WriteMetadata ( inT64 offset_table,
FILE *  output_file 
) [static]

Writes the number of entries and the given offset table to output_file.


The documentation for this class was generated from the following files:
Generated on Sun Jul 18 17:11:23 2010 for Tesseract by  doxygen 1.6.3