An intro to Natural Language Processing

Understanding a Language Started with Checking a Dictionary

I randomly browsed over the Oxford Dictionary of English today, with a quest to boil language down to the “first principle”.

I first checked something that a kid will probably learn at age of 3 — the meaning of “good” — “To be desired or approved of”.

Screenshot from https://en.oxforddictionaries.com

Pretending not knowing what “desire” means, I checked again — “Strong wish for or want something”.

Screenshot from https://en.oxforddictionaries.com

Again, what is “wish”? “Feel or express a strong desire”.

Screenshot from https://en.oxforddictionaries.com

Oh dear, it is a magical circular referencing. I am stuck and can only flip between “desire” and “wish” forever. From the dictionary, there is no way to truly understand the meaning of “desire” and “wish”.

You might not see so immediately if you check harder word, because it can be explained by simpler word. The same for learning a foreign language. But if you check the simplest word, this circular or even self-referencing comes up. Taking one more example, the meaning of “meaning” reads “what is meant by a word, text, concept, or action.”

Screenshot from https://en.oxforddictionaries.com

Meaning is Assigned by us

The only explanation is that meaning is assigned to words by us. We thought we understand them because everyone knows them, but actually we just memorize the meaning assignment but not because of how it spells (well, occasionally it is true as there are some word components that you knew). The spelling itself has no meaning in most cases.

Once we assign a meaning to a word, we know meanings of others. It’s like a network of words. For example, if we understand “desire”, we know “wish” and “good”. Even if we don’t know the meaning of “desire”, we still know that “wish” and “good” have a similar context with “desire” if they occur in the same context.

While the meaning of a word is assigned, we can guess so via observing the context. This is natural to human being in language comprehension.This links to the important concept — Distributional Hypothesis (words occur in the same context are more likely to have similar meaning), which had laid down the foundation of Natural Language Processing (NLP).

This is the rough idea of word embedding in NLP. The fancy term “word embedding” is just that we are baking meaning into a series of numbers (vector) that a computer can understand and can use for downstream task such as sentiment analysis.

Machine can Learn how we Assign Meaning

Going back to the topic — Do we understand it? None of us, neither does computer. We don’t understand it, we assign meaning to it and we know “good” links to “desire” and “wish”. Computer can’t understand the spelling “good” neither, but it can learn our assignment distribution — “good” is similar to “desire” and “wish”. After all, the meaning assignment is still the property of human.

P.S. Chinese might have a little bit different story because some evolves from pictograms.

<hr><p>Do you Understand it? was originally published in TDS Archive on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>