Suhem Parack

Posted on Jun 2, 2022

Exploring Tweets from a user's reverse chronological timeline

#python #beginners #api #twitter

The reverse chronological timeline endpoint in the Twitter API v2 returns Tweets that appear in a user's home timeline, with the most recent Tweets first. In this short tutorial, I will show how you can get these Tweets using the Tweepy package in Python and perform some basic exploratory analysis on them. Specifically, we will learn how to get:

Recent Tweets that appear in a user's home timeline
Timestamp for the first and last Tweet in the timeline
Most liked Tweet in the timeline
Different languages that appear in the timeline
Common topics that appear in the timeline
Accounts with the most Tweets that appear in the timeline
Types of Tweets that appear in the timeline

In order to use the Twitter API v2, you need to sign up for a Twitter developer account. Once you have signed up, you will need to obtain your keys and tokens to connect to the Twitter API in Python, using Tweepy. Note: the reverse chronological timeline endpoint only works with user access token (and not with app-only auth). Finally, make sure you have Python installed on your machine and that you have the most recent version of Tweepy installed by running:

pip3 install tweepy --upgrade

Getting recent 800 Tweets from the reverse chronological timeline

In order to get Tweets from the reverse chronological timeline with Tweepy, you will first have a to initializa the client (which makes the API calls for you) with your consumer_key, consumer_secret, access_token and access_token_secret. Next, you can use the get_home_timeline function. You can get maximum 100 Tweets per call, so if you want more Tweets, you will have to use the Paginator functionality in Tweepy, and specify how many Tweets you want returned. So, for example, if you want 800 Tweets (as shown in the code below), you can specify limit=8 for the Paginator.

Also, by default the Twitter API v2 returns only the Tweet ID and text for a Tweet. If you want additional data such as the time the Tweet was created, language of the Tweet, metrics (such as like_count), you will have to request those individually using fields and expansions. In this example, because we want the username of the person Tweeting, we will have to set expansions=['author_id']. Then, we will create a users dictionary with the user ID as the key and the user information such as name, username etc as the value, so that we can easily lookup user information for each Tweet.

In the example below, we are creating a dictionary called tweets_dict which contains the Tweet ID as the key and the value is an object with the Tweet text, the time when the Tweet was created, the number of likes for the Tweet, context annotations and user name of the person Tweeting it.

import tweepy

client = tweepy.Client(consumer_key='REPLACE_ME',
                       consumer_secret='REPLACE_ME',
                       access_token='REPLACE_ME',
                       access_token_secret='REPLACE_ME')

tweets_dict = dict()

# Limit = 8 below will result in recent 800 Tweets being returned because for each request we are requesting 100 Tweets
for response in tweepy.Paginator(client.get_home_timeline,
                                 max_results=100,
                                 tweet_fields=['created_at', 'lang', 'context_annotations', 'public_metrics', 'referenced_tweets'],
                                 expansions=['author_id'],
                                 limit=8):
    tweets = response.data
    users = {u["id"]: u for u in response.includes['users']}

    for tweet in tweets:
        print(tweet.id)
        user = users[tweet.author_id]
        tweets_dict[tweet.id] = {
            "id": tweet.id,
            "text": tweet.text,
            "created_at": tweet.created_at,
            "lang": tweet.lang,
            "like_count": tweet.public_metrics['like_count'],
            "context_annotations": tweet.context_annotations,
            "username": user.username
        }

print(len(tweets_dict))

First and last Tweet creation timestamp from the timeline

Now that we have the recent Tweets from the user's home timeline, we can simply get the first and last Tweet from the tweets_dict to determine the first and last Tweet in the users home timeline. In the example below, we also print the difference in time between these 2 Tweets.

first_tweet = tweets_dict[list(tweets_dict)[0]]
last_tweet = tweets_dict[list(tweets_dict)[-1]]

print("First Tweet in timeline is {} created at {}".format(first_tweet['id'], first_tweet['created_at']))
print("Last Tweet in timeline is {} created at {}".format(last_tweet['id'], last_tweet['created_at']))

print("Number of days between first and last Tweet: {}".format(first_tweet['created_at'] - last_tweet['created_at']))

In my case, I got the following response:

First Tweet in timeline is 1532378597620473856 created at 2022-06-02 15:08:22+00:00
Last Tweet in timeline is 1444321306330107905 created at 2021-10-02 15:20:08+00:00
Number of days between first and last Tweet: 242 days, 23:48:14

Most liked Tweet from the timeline

To get the most liked Tweet from the timeline, we can reverse sort tweets_dict on the like_count and that will give us Tweets based on the like_count (most to least).

for k,v in sorted(tweets_dict.items(), key=lambda x: x[1]['like_count'], reverse=True):
    print(k,v['like_count'])

Different languages present in the timeline

To see the common languages present in the timeline, we will create a languages dictionary and then count how many times a language appears and then reverse sort the dictionary based on the count and print it.

languages = dict()

for key, value in tweets_dict.items():
    if value['lang'] not in languages:
        languages[value['lang']] = 1
    else:
        languages[value['lang']] = languages[value['lang']] + 1

for k, v in sorted(languages.items(), key=lambda item: item[1], reverse=True):
    print(k, v)

In my case, it gave me the following response:

en 727
und 9
fr 5
in 3
es 2
tr 2
hi 1
pl 1
ar 1
ja 1
ro 1
it 1
tl 1
ca 1

Most common topics that appear in the timeline

The Twitter API v2 supports Tweet annotations that provide contextual information about a Tweet and return named entities present in a Tweet. Each context_annotation contains a domain and entity. Check out the complete list of supported domains here. We create a topics dictionary and count the entity name and add to it.

topics = dict()

for key, value in tweets_dict.items():
    if "context_annotations" in value:
        annotations = tweet['context_annotations']
        for annotation in annotations:
            if 'entity' in annotation:
                entity = annotation['entity']
                if 'name' in entity:
                    name = entity['name']
                    if name in topics:
                        topics[name] = topics.get(name) + 1
                    else:
                        topics[name] = 1

for k, v in sorted(topics.items(), key=lambda item: item[1], reverse=True):
    print(k, v)

In my case, the response I got is:

Services 756
Twitter 756

Most common accounts that appear in the timeline

In order to get the most common accounts that appear in the timeline, we can create a usernames dictionary and count the number of times a username appears in the timeline, and then we can reverse sort and print it.

usernames = dict()

for key, value in tweets_dict.items():
    if value['username'] not in usernames:
        usernames[value['username']] = 1
    else:
        usernames[value['username']] = usernames[value['username']] + 1

for k, v in sorted(usernames.items(), key=lambda item: item[1], reverse=True):
    print(k, v)

In my case, I got the following response:

suhemparack 374
icahdq 228
TwitterDev 114
hackingcommsci 18
TwitterAPI 16
SentimentsDev 6

Types of Tweets present in the timeline

Sometimes, you may want to understand how many of the Tweets that appear in the timeline are Original Tweets, Replies, Retweets or Quote Tweets. In order to do so, use the referenced_tweets field and then in the determine_tweet_type function, we check whether it is replied_to, quoted or retweeted. If it is neither, then we know that it is an original Tweet.

import tweepy


def determine_tweet_type(tweet):
    if 'referenced_tweets' in tweet:
        # Check for reply indicator
        if tweet['referenced_tweets'][0]['type'] == "replied_to":
            return "Reply Tweet"
        # Check for quote tweet indicator
        elif tweet['referenced_tweets'][0]['type'] == "quoted":
            return "Quote Tweet"
        # Check for retweet indicator
        elif tweet['referenced_tweets'][0]['type'] == "retweeted":
            return "Retweet"
        else:
            return "Original Tweet"
    else:
        return "Original Tweet"


client = tweepy.Client(consumer_key='REPLACE_ME',
                       consumer_secret='REPLACE_ME',
                       access_token='REPLACE_ME',
                       access_token_secret='REPLACE_ME')

tweets_dict = dict()

# Limit = 8 below will result in recent 800 Tweets being returned because for each request we are requesting 100 Tweets
for response in tweepy.Paginator(client.get_home_timeline,
                                 max_results=100,
                                 tweet_fields=['created_at', 'lang', 'context_annotations', 'public_metrics',
                                               'referenced_tweets'],
                                 expansions=['author_id'],
                                 limit=8):
    tweets = response.data
    users = {u["id"]: u for u in response.includes['users']}

    for tweet in tweets:
        user = users[tweet.author_id]
        tweets_dict[tweet.id] = {
            "id": tweet.id,
            "type": determine_tweet_type(tweet),
            "text": tweet.text,
            "created_at": tweet.created_at,
            "lang": tweet.lang,
            "like_count": tweet.public_metrics['like_count'],
            "context_annotations": tweet.context_annotations,
            "username": user.username
        }

types = dict()

for key, value in tweets_dict.items():
    if value['type'] not in types:
        types[value['type']] = 1
    else:
        types[value['type']] = types[value['type']] + 1

for k, v in sorted(types.items(), key=lambda item: item[1], reverse=True):
    print(k, v)

In my case, I got the following response:

Original Tweet 348
Retweet 195
Quote Tweet 112
Reply Tweet 101

I hope this tutorial is helpful to you in learning how to do exploratory analysis on the reverse chronological timeline. If you have any questions or feedback, feel free to reach out to me on Twitter.

DEV Community

Exploring Tweets from a user's reverse chronological timeline

Getting recent 800 Tweets from the reverse chronological timeline

First and last Tweet creation timestamp from the timeline

Most liked Tweet from the timeline

Different languages present in the timeline

Most common topics that appear in the timeline

Most common accounts that appear in the timeline

Types of Tweets present in the timeline

Top comments (0)

Read next

Straight to the Money 💰 minimalistic yet all-inclusive Python project template

The All-in-One Fake API for developers.

Build an API to Keep Your Marketing Emails Out of Spam

🚀 When to Use VPS, Vercel, and Cloudflare Worker: A Detailed Comparison