How to extract high-frequency words in NLTK

#python #nltk

While reading an official document for NLTK(Natural Language Toolkit), I tried extracting words which are frequently used in a sample text. This time, I tried to let the most frequency three words be in a display.

Development

Python
NLTK

Install NLTK

$ pip install nltk

Extract High-frequency words

Let me the coding begins. You should download punkt and averaged_perception_tagger initially for running word-tokenizing a part-of-speech acquisition. Next, read a sample text, and convert it to word-separation from text. And remove non-Noun things from this result. Finally, get the most frequent words.

Download

import nltk

nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

Import nltk, and then download punkt and averaged_perception_trigger. Once downloaded in the environment, you don't have to do it again.

Convert texts to word-tokenizing

raw = open('sample.txt').read()
tokens = nltk.word_tokenize(raw)
text = nltk.Text(tokens)

tokens_l = [w.lower() for w in tokens]

Prepare some essays or long texts. After reading this, it should be word-tokenized. Then, set up capital cases to lower cases, they should be recognized as the same.

Extract only Noun

only_nn = [x for (x,y) in pos if y in ('NN')]

freq = nltk.FreqDist(only_nn)

Remove non-noun words from this result. And calculate how frequency these words are included.

Get the most frequent three words

print(freq.most_common(3))

After counting frequent words, you can get the top three ones by most_common().

DEV Community

How to extract high-frequency words in NLTK

Development

Install NLTK

Extract High-frequency words

Download

Convert texts to word-tokenizing

Extract only Noun

Get the most frequent three words

Top comments (0)

Read next

Straight to the Money 💰 minimalistic yet all-inclusive Python project template

Build an API to Keep Your Marketing Emails Out of Spam

Connect to multiple databases, make or generate SQL queries, analyze or visualize.

Telegram bot para replicar sinais no mt5