Class POSTaggerME

java.lang.Object
opennlp.tools.postag.POSTaggerME
All Implemented Interfaces:
opennlp.tools.ml.Probabilistic, opennlp.tools.postag.POSTagger

@ThreadSafe public class POSTaggerME extends Object implements opennlp.tools.postag.POSTagger, opennlp.tools.ml.Probabilistic
A part-of-speech tagger implementation that uses maximum entropy.

Tries to predict whether words are nouns, verbs, or any other POS tags depending on their surrounding context.

A POS tagger instance is thread-safe. One instance can be shared across multiple threads to save both memory and model load time (loading a POSModel is the dominant startup cost; sharing one tagger avoids paying it per-thread).

Note: Thread safety uses LastResultOwnerOrThreadLocal (and related patterns elsewhere) so probs() sees per-thread last results without pinning unnecessary ThreadLocal entries for single-threaded short-lived instances. In container environments with classloader isolation (e.g. Jakarta EE), ensure instances do not outlive the application's lifecycle.

See Also:
  • Field Details

    • DEFAULT_BEAM_SIZE

      public static final int DEFAULT_BEAM_SIZE
      The default beam size value is 3.
      See Also:
  • Constructor Details

    • POSTaggerME

      public POSTaggerME(String language) throws IOException
      Initializes a POSTaggerME by downloading a default model for a given language.
      Parameters:
      language - An ISO conform language code.
      Throws:
      IOException - Thrown if the model could not be downloaded or saved.
    • POSTaggerME

      public POSTaggerME(String language, POSTagFormat format) throws IOException
      Initializes a POSTaggerME by downloading a default model for a given language.
      Parameters:
      language - An ISO conform language code.
      format - A valid POSTagFormat.
      Throws:
      IOException - Thrown if the model could not be downloaded or saved.
    • POSTaggerME

      public POSTaggerME(POSModel model)
      Initializes a POSTaggerME with the provided model.
      Parameters:
      model - A valid POSModel.
    • POSTaggerME

      public POSTaggerME(POSModel model, POSTagFormat format)
      Initializes a POSTaggerME with the provided model.
      Parameters:
      model - A valid POSModel.
      format - A valid POSTagFormat.
    • POSTaggerME

      public POSTaggerME(POSModel model, POSTagFormat format, int contextCacheSize)
      Initializes a POSTaggerME with the provided model and explicit cache configuration.
      Parameters:
      model - A valid POSModel.
      format - A valid POSTagFormat.
      contextCacheSize - size of the per-thread context generator cache. Use 0 to disable caching, -1 for the default (beam size), or a non-negative value; values less than -1 are not allowed.
  • Method Details

    • getAllPosTags

      public String[] getAllPosTags()
      Returns:
      Retrieves an array of all possible part-of-speech tags from the tagger.
    • tag

      public String[] tag(String[] sentence)
      Specified by:
      tag in interface opennlp.tools.postag.POSTagger
    • tag

      public String[] tag(String[] sentence, Object[] additionalContext)
      Specified by:
      tag in interface opennlp.tools.postag.POSTagger
    • tag

      public String[][] tag(int numTaggings, String[] sentence)
      Returns at most the specified numTaggings for the specified sentence.
      Parameters:
      numTaggings - The number of tagging to be returned.
      sentence - An array of tokens which make up a sentence.
      Returns:
      At most the specified number of taggings for the specified sentence.
    • topKSequences

      public opennlp.tools.util.Sequence[] topKSequences(String[] sentence)
      Specified by:
      topKSequences in interface opennlp.tools.postag.POSTagger
    • topKSequences

      public opennlp.tools.util.Sequence[] topKSequences(String[] sentence, Object[] additionalContext)
      Specified by:
      topKSequences in interface opennlp.tools.postag.POSTagger
    • probs

      public void probs(double[] probs)
      Populates the specified probs array with the probabilities for each tag of the last tagged sentence.
      Parameters:
      probs - An array to put the probabilities into.
    • probs

      public double[] probs()
      The sequence was determined based on the previous call to tag(String[]).
      Specified by:
      probs in interface opennlp.tools.ml.Probabilistic
      Returns:
      an array with the same number of probabilities as tokens were sent to tag(String[]) when it was last called
    • clearThreadLocalState

      public void clearThreadLocalState()
      Removes thread-local state to prevent classloader leaks in container environments. Call when the thread is returned to a pool or the tagger is no longer needed.
    • getOrderedTags

      public String[] getOrderedTags(List<String> words, List<String> tags, int index)
    • getOrderedTags

      public String[] getOrderedTags(List<String> words, List<String> tags, int index, double[] tprobs)
    • train

      public static POSModel train(String languageCode, opennlp.tools.util.ObjectStream<opennlp.tools.postag.POSSample> samples, opennlp.tools.util.TrainingParameters mlParams, POSTaggerFactory posFactory) throws IOException
      Starts a training of a POSModel with the given parameters.
      Parameters:
      languageCode - The ISO language code to train the model. Must not be null.
      samples - The ObjectStream of POSSample used as input for training.
      mlParams - The TrainingParameters for the context of the training process.
      posFactory - The POSTaggerFactory for creating related objects as defined via mlParams.
      Returns:
      A valid, trained POSModel instance.
      Throws:
      IOException - Thrown if IO errors occurred.
    • buildNGramDictionary

      public static Dictionary buildNGramDictionary(opennlp.tools.util.ObjectStream<opennlp.tools.postag.POSSample> samples, int cutoff) throws IOException
      Constructs an nGram dictionary from an ObjectStream of samples.
      Parameters:
      samples - The ObjectStream to process.
      cutoff - A non-negative cut-off value.
      Returns:
      A valid Dictionary instance holding nGrams.
      Throws:
      IOException - Thrown if IO errors occurred during dictionary construction.
    • populatePOSDictionary

      public static void populatePOSDictionary(opennlp.tools.util.ObjectStream<opennlp.tools.postag.POSSample> samples, opennlp.tools.postag.MutableTagDictionary dict, int cutoff) throws IOException
      Populates a POSDictionary from an ObjectStream of samples.
      Parameters:
      samples - The ObjectStream to process.
      dict - The MutableTagDictionary to use during population.
      cutoff - A non-negative cut-off value.
      Throws:
      IOException - Thrown if IO errors occurred during dictionary construction.