Category: Linguistic

  • NLP – Natural language processing

    Natural Language Processing, or NLP, is broadly defined as the software automatically manipulating natural languages, like speech and text. One of the first steps required for Natural Language Processing (NLP) is the extraction of tokens in text. The process of tokenization splits text into tokens – that is, words. Usually, tokens are split based upon […]

  • English Lemmatizer

    Lemmatization is the process of reducing an inflected spelling to its lexical root or lemma form. The lemma form is the base form or headword form you would find in a dictionary. The combination of the lemma form with its word class (noun, verb. etc.) is called the lexeme. In English, the base form for […]

  • Word list

    A word list (or lexicon) is a list of a language’s lexicon (generally sorted by frequency of occurrence either by levels or as a ranked list) within some given text corpus, serving the purpose of vocabulary acquisition. A lexicon sorted by frequency “provides a rational basis for making sure that learners get the best return […]

  • Text corpus

    The term language corpus is used to mean a number of rather different things. It may refer simply to any collection of linguistic data (for example, written, spoken, signed, or multimodal), although many practitioners prefer to reserve it for collections which have been organized or collected with a particular end in view, generally to characterize […]

  • Bound Morphemes

    A bound morpheme is a word element that cannot stand alone as a word, including both prefixes and suffixes. A bound morpheme is a morpheme (the smallest meaningful lexical item in a language). A morpheme is not a word. The difference between a morpheme and a word is that a morpheme sometimes does not stand […]

  • Lexicon

    A lexicon is the vocabulary of a language or branch of knowledge. A list of all the words used in a particular language or subject, or a dictionary. Linguistic theories generally regard human languages as consisting of two parts: a lexicon, essentially a catalogue of a language’s words; and a grammar, a system of rules […]