-
Count words in C lang
Previously we showed how to count words in java.Now we will demonstrate how to count words in C lang. First, […]
-
Apache OpenNLP – Tokenization
Tokenization is a process of segmenting strings into smaller parts called tokens(say sub-strings). Usually, these tokens are words, numbers, or […]
-
NLP – Natural language processing
Natural Language Processing, or NLP, is broadly defined as the software automatically manipulating natural languages, like speech and text. One […]
-
English lemmatization
Lemmatization is the process of reducing an inflected spelling to its lexical root or lemma form. The lemma form is […]
-
Word list
A word list (or lexicon) is a list of a language’s lexicon (generally sorted by frequency of occurrence either by […]
-
Text corpus
The term language corpus is used to mean a number of rather different things. It may refer simply to any […]
-
Bound Morphemes
A bound morpheme is a word element that cannot stand alone as a word, including both prefixes and suffixes. A […]
-
Lexicon
A lexicon is the vocabulary of a language or branch of knowledge. A list of all the words used in […]
-
Counting characters in Java
There are many ways for counting the number of characters in a String. Below a simple/naive approach:
-
Counting words in Java
This is a simple way to count words in a string in Java. StringTokenizer automatically takes care of whitespace for […]