Natural Language Processing, What Is It?

Natural Language Processing, What Is It

Lately, we are hearing a lot about Natural Language Processing (NLP), and now more in this digital world. However, PLN is a discipline that has more than 50 years of research and development.

What is natural language processing?

Natural Language Processing is the field of knowledge of Artificial Intelligence that investigates how machines communicate with people through natural languages, such as Spanish, English, or Chinese.

Virtually any human language can be processed by computers. Logically, limitations of economic or practical interest mean that only the languages ​​most widely spoken or used in the digital world have applications in use.

Consider how many languages ​​Siri (20) or Google Assistant (8) speak. English, Spanish, German, French, Portuguese, Chinese, Arabic, and Japanese (not necessarily in this order) are the ones with the most applications that understand them. Google Translate is the one that deals with the most languages, exceeding a hundred… but there are between 5,000 and 7,000 languages ​​in the world.

Human languages ​​can be expressed in writing (text), orally (voice), and also through signs. Naturally, the PLN is more advanced in word processing, where there is much more data, and it is easier to obtain in electronic format.

The audio, even if they are in digital format, must be processed to transcribe them into letters or characters and, from there, understand the question. The response process is the reverse: first, the sentence is elaborated, and then the voice is “synthesized.”

By the way, the artificial voice increasingly sounds more human, with tonal and prosodic inflections that mimic human production.

Models for natural language processing

Treating a language computationally implies a process of mathematical modeling. Computers only understand bytes and digits, and computer scientists code programs using programming languages ​​such as C, Python, or Java.

Computational linguists are entrusted with “preparing” the linguistic model for computer engineers to implement in efficient and functional code. There are two general approaches to the problem of linguistic modeling:

Logical Models: grammars

Linguists write structural pattern recognition rules, employing a concrete grammatical formalism. These rules, in combination with the information stored in computer dictionaries, define the patterns that must be recognized to solve the task (search for information, translate, etc.).

These logical models are intended to reflect the logical structure of language and arise from the theories of N. Chomsky in the 50s. 

Probabilistic models of natural language: based on data

The approach is the other way around: linguists collect collections of examples and data (corpus), and from them, the frequencies of different linguistic units (letters, words, sentences) and their probability of appearing in a given context are calculated. By calculating this probability, it is possible to predict what the next unit will be in a given context without resorting to explicit grammar rules.

The paradigm of “machine learning” has been imposed in the last decades in Artificial Intelligence: the algorithms infer the possible answers from the data previously observed in the corpus.

Components of natural language processing

Next, we look at some of the components of natural language processing. Not all of the analytics described apply to any PLN task but rather depend on the purpose of the application.

  1. Morphological or lexical analysis. It consists of the internal analysis of the words that form sentences to extract lemmas, inflectional features, and compound lexical units. It is essential for basic information: syntactic category and lexical meaning.
  2. Syntactic analysis. It analyzes the sentence structure according to the grammatical model used (logical or statistical).
  3. Semantic analysis. It interprets the sentences once the morphosyntactic ambiguities have been eliminated.
  4. Pragmatic analysis. It incorporates the analysis of the context of use to the final interpretation. This includes treating figurative language (metaphor and irony) as the knowledge of the specific world necessary to understand a specialized text.

A morphological, syntactic, semantic, or pragmatic analysis will be applied depending on the objective of the application. For example, a text-to-speech converter does not need semantic or pragmatic analysis. But a conversational system requires detailed context and subject domain information.

Also Read : Software Engineering’s Relationship With AI And Machine Learning

Leave a Reply

Your email address will not be published. Required fields are marked *