Class BPETokenizerFactory

java.lang.Object
opennlp.tools.util.BaseToolFactory
opennlp.tools.tokenize.BPETokenizerFactory

public class BPETokenizerFactory extends BaseToolFactory
A BaseToolFactory for BPE tokenization that manages the BPE merge rules artifact and its serialization within a BPEModel.

This factory is responsible for:

  • Providing the BPETokenizerFactory.BPEMergesSerializer that reads and writes BPE merge rules as a text-based artifact (bpe.merges) inside the model ZIP package.
  • Supplying the merge rules to the BPEModel via BaseToolFactory.createArtifactMap().
  • Validating that a loaded model contains valid merge rules.

This class is typically not used directly. It is instantiated internally by BPETokenizerTrainer during training and by BPEModel during model loading.

See Also:
  • Constructor Details

    • BPETokenizerFactory

      public BPETokenizerFactory()
      Creates a BPETokenizerFactory. Required empty constructor for model loading.
    • BPETokenizerFactory

      public BPETokenizerFactory(String langCode)
      Creates a BPETokenizerFactory with the given language code.
      Parameters:
      langCode - The ISO language code. Must not be null.
      Throws:
      IllegalArgumentException - if langCode is null.
  • Method Details

    • createArtifactSerializersMap

      public Map<String,opennlp.tools.util.model.ArtifactSerializer<?>> createArtifactSerializersMap()
      Creates a Map with pairs of keys and ArtifactSerializer. The models implementation should call this method from BaseModel#createArtifactSerializersMap.

      The base implementation will return a HashMap that should be populated by subclasses.

      Overrides:
      createArtifactSerializersMap in class BaseToolFactory
    • createManifestEntries

      public Map<String,String> createManifestEntries()
      Overrides:
      createManifestEntries in class BaseToolFactory
      Returns:
      Retrieves the manifest entries to be added to the model manifest.
    • validateArtifactMap

      public void validateArtifactMap() throws opennlp.tools.util.InvalidFormatException
      Validates the parsed artifacts.

      Note: Subclasses should generally invoke super.validateArtifactMap at the beginning of this method.

      Specified by:
      validateArtifactMap in class BaseToolFactory
      Throws:
      opennlp.tools.util.InvalidFormatException - Thrown if validation found invalid states.
    • getLanguageCode

      public String getLanguageCode()
      Returns:
      The ISO language code for this factory.