Package opennlp.tools.tokenize
package opennlp.tools.tokenize
-
ClassDescriptionThe
BPEModelstores learned BPE merge operations and can be serialized and deserialized for reuse.ATokenizerimplementation that performs subword tokenization using Byte Pair Encoding (BPE).Represents a pair of adjacent symbols in BPE.ABaseToolFactoryfor BPE tokenization that manages the BPE merge rules artifact and its serialization within aBPEModel.Learns BPE merge operations from a training corpus and produces aBPEModel.A defaultTokenContextGeneratorwhich produces events for maxent decisions for tokenization.A rule based detokenizer.A basicTokenizerimplementation which performs tokenization using character classes.Deprecated.A cross validator fortokenizers.TheTokenizerEvaluatormeasures the performance of the givenTokenizerwith the provided referencesamples.The factory that providesTokenizerdefault implementation and resources.ATokenizerfor converting raw text into separated tokens.TheTokenizerModelis the model used by a learnableTokenizer.This class is astream filterwhich reads in string encoded samples and createssamplesout of them.This class reads thesamplesvia anIteratorand converts the samples intoeventswhich can be used by the maxent library for training.This stream formatsObjectStreamofsamplesinto whitespace separated token strings.
TokenizerMEis itself thread-safe.