DEV Community

mohaned mashaly
mohaned mashaly

Posted on • Edited on

nlp in 3 minutes

what is NLP :

NLP is an acronym for Natural processing language which is a sub-field of AI where it tries to interpret and understand human language through different information extraction techniques, nlp aims to make interaction between computers and human more seamless to the extent that a computer can master human language and understand it, NLP has two main fields underneath it's hood which NLU(Natural Language Understanding) & NLG(Natural Language Generation), NLU tries to understand human text and speech by extracting data from different sources(Blog, Articles, etc..) while NLG tries to generate data in a certain language,imagine someone is trying to write a paragraph in a language other than his native one(computer's native language is 0's and 1's) without studying the language first, of-course he can't so does the machine or the computer so to understand human language the pipeline of NLP start with NLU actions and processes then NLG ones because s normal can't write a paragraph or summarise a text without understanding it.

what is NLTK:

NLTK is the Natural Language Tool Kit in python, it's an nlp library used to perform nlp functions in Python like Named-entity recognition or training on a corpus, POS(Part of Speech Tagging), Stemming, Lemmitizing and a-lot of different functions.

some NLP Definitions:

1.Corpus:
Corpus is a very simple term, it's a collection of words and
sentences and it's available in different languages, it covers
dialogues from movies, quotes, names, etc..., the supported
languages in NLTK i worked with were English, German, Arabic
but i believe it support more languages .

2.POS(Part of speech tagging):
Part of a speech tagging is used to identify a word in the
sentence whether it's a noun, adverb, verb, adjective, pro-noun
and many more states i was fascinated by it when i saw in nltk.

3.Lemmitizing:
Lemmitizing is a famous technique used in linguistics, it's not
restricted to nlp only actually a-lot of the techniques
presented so far is related to linguistics more than computer
science or ML since NLP is a joint between Linguistics and
Machine Learning or we can consider one of AI applications or
domains in Linguistics, what lemmitizing basically does it
convert a word to it's normal state by removing additional
characters or converting a word into adjective or a verb.

4.Stemming:
a variation of Lemmitizing, personally i don't find stemming
useful(don't understand it's use), lemmitizing is much more
better and accurate.

5.Named-entity recognition:
Named Entity Recognition is categorising words or sentences
into categories, Name, Company, City,etc... it determines
whether a word in a sentence is a person's name or a company's
or city or whatever category it belongs to , Name-entity
recognition Algorithm is a model trained on different corpus to
learn how to categorise words into entities.

6.Word Tokenization:
any-one familiar with python will find this concept fairly
simple, a tokenizer breaks the sentence into words, nltk has an
awesome feature, there's two types of tokenizers, word and
sentence where word-tokenizer breaks a paragraph into words
while sentence-tokenizer breaks a paragraph into smaller
sentences.

NLP in action:
now i write two small program to practically demonstrate how
useful nltk can be, try to guess the concepts used in the
programs.

   import nltk 
   from nltk.tokenize import word_tokenize
   Text = "I am wondering who did produce Tom and Jerry Series is 
   it Disney or Netflix?"
   tokenized_sentence= new word_tokenize(Text)
   tagged_sentence = nltk.pos_tag(Text)
   for word in tokenized_sentence: 
     print(word) 
   print(tagged_sentence)
Enter fullscreen mode Exit fullscreen mode
   import nltk 
   from nltk.tokenize import word_tokenize
   Text = "Elon Musk wants to Colonize mars but i believe 
   SpaceX and even Nasa still have a long road to take "
   tokenized_sentence= word_tokenize(Text)
   tagged_sentence = nltk.pos_tag(tokenized_sentence)
   Sentence_entities = nltk.chunk.ne_chunk(tagged_sentence)
Enter fullscreen mode Exit fullscreen mode

Fun to Read:
https://www.datacamp.com/community/tutorials/text-analytics-beginners-nltk
https://pythonprogramming.net/lemmatizing-nltk-tutorialhttps://www.datacamp.com/community/tutorials/text-analytics-beginners-nltk
https://medium.com/m/global-identity?redirectUrl=https%3A%2F%2Ftowardsdatascience.com%2Fnatural-language-processing-nlp-top-10-applications-to-know-b2c80bd428cb

Top comments (0)