Package opennlp.tools.sentdetect
Class SentenceDetectorME
java.lang.Object
opennlp.tools.sentdetect.SentenceDetectorME
- All Implemented Interfaces:
opennlp.tools.ml.Probabilistic,opennlp.tools.sentdetect.SentenceDetector
@ThreadSafe
public class SentenceDetectorME
extends Object
implements opennlp.tools.sentdetect.SentenceDetector, opennlp.tools.ml.Probabilistic
A sentence detector for splitting up raw text into sentences.
A maximum entropy model is used to evaluate end-of-sentence characters in a string to determine if they signify the end of a sentence.
A sentence detector instance is thread-safe. One instance can be shared across multiple threads to save memory.
Note: In container environments with classloader isolation (e.g. Jakarta EE), ensure instances do
not outlive the application's lifecycle, as underlying components use ThreadLocal state that may
pin the classloader.
-
Field Summary
Fields -
Constructor Summary
ConstructorsConstructorDescriptionSentenceDetectorME(String language) Initializes the sentence detector by downloading a default model.SentenceDetectorME(SentenceModel model) Initializes the current instance.SentenceDetectorME(SentenceModel model, Dictionary abbDict) Instantiates aSentenceDetectorMEwith an existingSentenceModel.SentenceDetectorME(SentenceModel model, Factory factory) Deprecated. -
Method Summary
Modifier and TypeMethodDescriptionvoidRemoves thread-local state to prevent classloader leaks in container environments.double[]Deprecated, for removal: This API element is subject to removal in a future version.Useprobs()instead.double[]probs()The sequence was determined based on the previous call tosentDetect(CharSequence).String[]Detects sentences in given inputCharSequence..opennlp.tools.util.Span[]Detects the position of the first words of sentences in aCharSequence.static SentenceModeltrain(String languageCode, opennlp.tools.util.ObjectStream<opennlp.tools.sentdetect.SentenceSample> samples, SentenceDetectorFactory sdFactory, opennlp.tools.util.TrainingParameters mlParams) Starts a training of aSentenceModelwith the given parameters.
-
Field Details
-
SPLIT
Constant indicates a sentence split.- See Also:
-
NO_SPLIT
Constant indicates no sentence split.- See Also:
-
-
Constructor Details
-
SentenceDetectorME
Initializes the sentence detector by downloading a default model.- Parameters:
language- The language of the sentence detector.- Throws:
IOException- Thrown if the model cannot be downloaded or saved.
-
SentenceDetectorME
Initializes the current instance.- Parameters:
model- theSentenceModel
-
SentenceDetectorME
Instantiates aSentenceDetectorMEwith an existingSentenceModel.- Parameters:
model- TheSentenceModelto be used.abbDict- TheDictionaryto be used. It must fit the language of themodel.
-
SentenceDetectorME
Deprecated.Use aSentenceDetectorFactoryto extend SentenceDetector functionality.
-
-
Method Details
-
sentDetect
Detects sentences in given inputCharSequence..- Specified by:
sentDetectin interfaceopennlp.tools.sentdetect.SentenceDetector- Parameters:
s- TheCharSequence. to be processed.- Returns:
- A string array containing individual sentences as elements.
-
sentPosDetect
Detects the position of the first words of sentences in aCharSequence.- Specified by:
sentPosDetectin interfaceopennlp.tools.sentdetect.SentenceDetector- Parameters:
s- TheCharSequenceto be processed.- Returns:
- An
span arraycontaining the positions of the end index of every sentence.
-
probs
public double[] probs()The sequence was determined based on the previous call tosentDetect(CharSequence).- Specified by:
probsin interfaceopennlp.tools.ml.Probabilistic- Returns:
- an array with the same number of probabilities as for the last
sentDetect(CharSequence)call; if not applicable, an empty array
-
clearThreadLocalState
public void clearThreadLocalState()Removes thread-local state to prevent classloader leaks in container environments. Call when the thread is returned to a pool or the sentence detector is no longer needed. -
getSentenceProbabilities
Deprecated, for removal: This API element is subject to removal in a future version.Useprobs()instead.- Returns:
- The probability for each sentence returned for the most recent call to
sentDetect(CharSequence); if not applicable, an empty array
-
train
public static SentenceModel train(String languageCode, opennlp.tools.util.ObjectStream<opennlp.tools.sentdetect.SentenceSample> samples, SentenceDetectorFactory sdFactory, opennlp.tools.util.TrainingParameters mlParams) throws IOException Starts a training of aSentenceModelwith the given parameters.- Parameters:
languageCode- The ISO language code to train the model. Must not benull.samples- TheObjectStreamofSentenceSampleused as input for training.sdFactory- TheSentenceDetectorFactoryfor creating related objects as defined viamlParams.mlParams- TheTrainingParametersfor the context of the training process.- Returns:
- A valid, trained
SentenceModelinstance. - Throws:
IOException- Thrown if IO errors occurred.
-
SentenceDetectorFactoryto extend SentenceDetector functionality.