Word Frequency Blog

Word Frequency Tool

  • NLP – Natural language processing

    Natural Language Processing, or NLP, is broadly defined as the software automatically manipulating natural languages, like speech and text. One of the first steps required for Natural Language Processing (NLP) is the extraction of tokens in text. The process of tokenization splits text into tokens – that is, words. Usually, tokens are split based upon […]

    April 27, 2022
  • English Lemmatizer

    Lemmatization is the process of reducing an inflected spelling to its lexical root or lemma form. The lemma form is the base form or headword form you would find in a dictionary. The combination of the lemma form with its word class (noun, verb. etc.) is called the lexeme. In English, the base form for […]

    April 25, 2022
  • Word list

    A word list (or lexicon) is a list of a language’s lexicon (generally sorted by frequency of occurrence either by levels or as a ranked list) within some given text corpus, serving the purpose of vocabulary acquisition. A lexicon sorted by frequency “provides a rational basis for making sure that learners get the best return […]

    March 24, 2022
  • Text corpus

    The term language corpus is used to mean a number of rather different things. It may refer simply to any collection of linguistic data (for example, written, spoken, signed, or multimodal), although many practitioners prefer to reserve it for collections which have been organized or collected with a particular end in view, generally to characterize […]

    March 15, 2022
  • Bound Morphemes

    A bound morpheme is a word element that cannot stand alone as a word, including both prefixes and suffixes. A bound morpheme is a morpheme (the smallest meaningful lexical item in a language). A morpheme is not a word. The difference between a morpheme and a word is that a morpheme sometimes does not stand […]

    March 14, 2022
  • Lexicon

    A lexicon is the vocabulary of a language or branch of knowledge. A list of all the words used in a particular language or subject, or a dictionary. Linguistic theories generally regard human languages as consisting of two parts: a lexicon, essentially a catalogue of a language’s words; and a grammar, a system of rules […]

    March 14, 2022
  • Counting characters in Java

    There are many ways for counting the number of characters in a String. Below a simple/naive approach:

    March 11, 2022
  • Counting words in Java

    This is a simple way to count words in a string in Java. StringTokenizer automatically takes care of whitespace for us, like tabs and carriage returns. In some cases like in “he-man”, we’d want “he” and “man” to be different words, but since there’s no whitespace between them, the defaults fail us. Fortunately, we can […]

    March 11, 2022

Word Frequency Blog

  • Java
  • Linguistic
  • Programming