cue.language is a unique, easy to use, small library of Java code and resources that provides the following basic natural-language processing capabilities:
· Tokenizing natural language text into individual words
· Tokenizing natural language text into sentences
· Tokenizing natural language text into n-grams (sequences of 2 or more words that appear next to each other in a sentence)
· Counting strings
· Detecting which script (alphabet, writing system) is required to represent a text
· Guessing what language a text is in
· Customizable "stop word" detection for a variety of languages
Requirements:
· Java