Class BPEModel

java.lang.Object
opennlp.tools.util.model.BaseModel
opennlp.tools.tokenize.BPEModel
All Implemented Interfaces:
Serializable, opennlp.tools.util.model.ArtifactProvider

public final class BPEModel extends BaseModel
The BPEModel stores learned BPE merge operations and can be serialized and deserialized for reuse.

A model is created by the BPETokenizerTrainer and contains an ordered list of BPETokenizer.SymbolPair merge operations that define the BPE vocabulary. The model is persisted as a standard OpenNLP ZIP package with a bpe.merges artifact containing the merge rules.

Usage:


 // Create via training
 BPETokenizerTrainer trainer = new BPETokenizerTrainer();
 BPEModel model = trainer.train(corpus, 10000, "en");

 // Save to disk
 model.serialize(Path.of("bpe-en.bin"));

 // Load from disk
 BPEModel loaded = new BPEModel(Path.of("bpe-en.bin"));

 // Use for tokenization
 BPETokenizer tokenizer = new BPETokenizer(loaded);
 
See Also: