I have been thinking about the people supporting our repo lately. Every new stargazer means a lot to us, as it validates that we are building the right thing.
And sometimes, it means even moreā¦
Jokes aside, during the last few weeks, weāve seen our āstar historyā go up. In less than a couple of months, we made it to 500 stars on GitHub, which is a positive sign.
Weāre quite new in this space, but all the research weāve done suggests that number of stars is a good proxy for product interest and usage. So we optimized for that and itās working.
But I still felt we were missing something. Even though we are getting more stars every day, we know little to nothing about our stargazers:
- Who they are?
- What do they work on?
- What other repos are interesting for them?
- Who are the people they follow?
- Is there any relevant company interested in our product?
There were a lot of questions popping up in my mind, so I started to think that other people might be in the same scenario. If we manage to know all of this stuff, then we can help to solve more problems for them.
I took a look into GitHub and I saw I could get a lot of quantitative info, but the process was very manual. Then, I searched for existing repos to solve this, but didnāt fit my needs. Most of them were very old and I couldnāt make it work.
Thatās why I decided to build my own solution.
Here you can see a live demo (data has been anonymized): https://repo-analysis-new-alfred.latitude.page/
And here you have the link to the repo: https://github.com/latitude-dev/stargazers-analysis
Iāll try to explain step by step how I made it, so you can do the same. Hope this helps!
The GitHub API
First thing to figure out. The GitHub API allows us to check a lot of information from a repo that, when aggregated, could give us really good insights. Hereās what I needed: a flexible dataset to make data explorations, aggregations, mix data and share it with my peers at Latitude.
At Latitude we have a way to explore and expose the data. The easiest way is using our DuckDB connector and .csv files. I only was missing the .csv files.
These are the steps I followed later:
1. ChatGPT to create a Python code with no knowledge
I thought that I could create a small script to get the info from the GitHub API. The API seemed simple and although I like to code, Iām a product designer so I needed some help.
I asked to ChatGPT for a script that:
- Uses the GraphQL API instead of the REST one because allows more requests by minute (2,000 vs 900)
- Uses 2 workers to speed up the process. We could use more but itās not recommended by GitHub because hit the API limits too quickly
- Manage retries when we hit the GitHub GraphQL API limits. The amount of data is huge and the API limitations make the process automatic but long. In my case, 500 stargazers took ~2 hours
- Do not take into account the users with more than 5k repos starred. We want quantitative data aggregated to detect common patterns so people with 40k repos starred wouldnāt give as much more value in exchange for all the requests needed
And the data required:
- Users info from Latitudeās stargazers
- Other repositories those users have starred
- The people those users follow
So this way once the script is finished, we have user-details.csv, organizations.csv, following.csv, and repos-starred.csv.
This has been the first time I feel the power of ChatGPT with no bullshit. It wasn't super easy, because of the way I learned about the API limitations, but the summary is I got exactly what I needed. Hereās the code.
2. Run the script and get the .csv files
Once the script is in place, letās see how it works. In the repo you can download everything that you need. The steps to run the script are the following:
Initial requirements
- Before starting, make sure you have Python installed. You can check the official page here: https://www.python.org/downloads/
- Generate a GitHub Access Token to authenticate you and the GitHub API later. You can generate it from https://github.com/settings/tokens
-
In the file fetch_stargazers_data.py replace the repo owner and repo name in lines 14 and 15 with the info of the repo you want to analyze. For example, for github.com/latitude-dev/latitude/ would be like this:
14 REPO_OWNER = 'latitude-dev' 15 REPO_NAME = 'latitude'
To start developing in this project, first ensure you have Node.js >= 18 installed. Then, install the Latitude CLI:
npm install -g @latitude-data/cli
Running the script
- Clone the repo
- Open the terminal and go to the root of the repo cloned.
- Run
pip install requests
- Then, run
export GITHUB_TOKEN='YOUR_TOKEN_HEREā
. Replace YOUR_TOKEN_HERE with the token generated before. - And finally, run
python3 fetch_stargazers.py
- Important ā the script could take a long time to finish. ~500 stargazers took us ~2 hours for reference.
When is finished, it will save 4 .csv files inside the queries folder which is where the sources must be to analyze the data with Latitude. You can open the .csv files to check the data is there.
3. Exploring the data with Latitude
At this point, we have the data ready, but we want to visualize it. The cloned repo is prepared to take your .csv files, using Latitude and DuckDB, and build a data visualization app.
- In the queries folder, there are the data explorations using just SQL
- In the views folder, there is the frontend for the data ā using Latitude components, Svelte and HTML
You can modify any .sql query or .html view to adapt it to your needs. You can follow the Latitude docs.
The project is divided into different sections:
- Overview - Top 5 of common repos, following users, organizations, and your stargazers with more followers. Note that the common lists take into account how many stargazers of your repo have starred the same other repo, followed the same other userā¦ Also, this page shows some analysis of your stargazers at the bottom.
- Users list - The entire list of stargazers of your repo with their info and filters.
- Repos - The entire list of the common repos starred by your stargazers.
- Following - The entire list of the users followed by your stargazers.
- Location - The entire list of the common locations your stargazers are based in.
- Organization - The entire list of the common organizations your stargazers belong to.
- Company - The entire list of the companies your stargazers work for.
To see the app with your project run latitude dev
from the terminal at the root of the project.
4. Sharing with the team
Until now, the project is running locally on your machine, but probably you want to share the info with your team. You have 2 options:
- Deploy on your own. You can follow the docs here.
- Just run
latitude deploy
and weāll manage to deploy it in our cloud giving you a URL to share it easily. You can follow the docs here.
I believe this tool is really valuable for everyone who wants to know more about their GitHub repo and how to serve better their users and stargazers. Also, being an open-source solution is something that everybody can access and I feel itās a good way to give back to the community that is supporting Latitude.
And thatās it. I hope you find value in what I built :)
Iād love to get your feedback in the comments, let me know if you have any questions.
Top comments (10)
I know I may be the odd one out here, but I find this project tremendously creepy. Stargazers as a status symbol is bad enough, but going through the trouble of trying to identify them seems unethical.
Sometimes the data collection is the vulnerability, and I feel like this is very much one of those times.
I agree about the status symbol thing. I'm a little out of touch I guess - I had no idea "stargazers" was a term - but I fundamentally don't understand the post:
When I see people begging for likes it really puts me off interacting with them or giving them any promotion. The post implies that they need to reach some arbitrary threshold in order to unlock something? If so, that's a terrible system on the part of (presumably) Github.
I think this is the wrong way to look at the world. Correlation not being causation and all that. To come up with an absurd analogy, it's like saying that people are happier when they have pets, and dogs crap on the pavement, so we'll get happier communities if we optimise the streets for poo. Begging people to star your page is going to have a very tiny impact* on how many people are interested in the project, as the sort of person who bends to your will that way has probably starred a gazillion projects already.
Meta-analysis of user data isn't necessarily ethical or unethical depending on relevance and what you want the data for. Finding out that most of the watchers for your app also follow a lot of developer tools might be good data supporting the idea of exposing an API. But really, you have a platform already and you can use a poll instead, which is inherently opt-in and self-selected for people who are interested.
To be clear, all info in the app comes from users' profiles on Github, so there really is no shaddy user identification process going on here ā we are just aggregating and organizing public data so that we can better understand the kind of users interested on Latitude. I'd agree this would have been unethical if we had scrapped the web for more personal data or, worse, tried to get contact info to poke stargazers.
Point taken on the Github stars thing ā I partially agree with you on this.
Aggregating data in ways that identifies users who didnāt expect or consent does seem pretty shady to me.
Great project. I love the fact the data are anonymized š„°
I'm curious, was there nothing like your project before?
I appreciate your support!
I found this repo: github.com/spencerkimball/stargazers
But the last commit was from 7 years ago and I couldnāt make it work.
Thanks for replying me
Very cool project to demonstrate what you're building in practice to solve a real world problem.
What are you using for latitude deploy? Docker and EC2?
He is using Latitude Cloud, our managed infra to deploy Latitude projects. It's built on top of Docker and ECS.
Thank you, that is what I meant. The server stack you were using for latitude cloud.
Some comments may only be visible to logged-in visitors. Sign in to view all comments.