DEV Community

Cover image for Why academic papers are broken: from biased citations to impossible verification chains 🎓
Tomáš Hobza
Tomáš Hobza

Posted on

Why academic papers are broken: from biased citations to impossible verification chains 🎓

Disclaimer: Anything I mention in this blog post is purely my opinion based on my experiences.

🎓 Introduction

Before I started studying on my university, I've done quite a few project - both personal and commercial. I thought there wouldn't be such a big difference with university projects, but boy was I wrong.

Obviously, the whole process of working on a university project is much more formal and you simply cannot make decision based on just "vibes".

What do I mean by this? Well let's say that you're working on a project a simple TODO app. You want to add a button to create a new task entry. Now, where do you put that button? In a personal project you would most likely put it where it "feels" like or where other similar apps have put similar buttons. For an academic work, you cannot just omit this reasoning part - you have to provide either an analytical proof that this is the best spot for that button (based on other already studies of people who devoted their entire lives to researching button placement) or some research study where you test the different button placement options on real people.

Now, you might think that this approach is a good practice for life. If you're writing an airplane controlling software, you typically shouldn't introduce any Bulgarian constants - and you're absolutely right!

What I want to focus on is not the fact that you have to write the paper as "proof" - I want to focus on how the "proof" is provided in the resulting paper.

My stance is that the system right now is flawed and the following chapters are my explanation why.

🧬 How the academic system works

The premise is quite simple - [your paper] = [your work] + [stuff you based you work on].

In a way you can visualize it as inheritance - you work inherits from other people's works while adding something of it's own.

For example: I want to build a neural network -> I need to learn how a neural network works -> I read other people's works -> based on that I implement the neural network. (my addition here being the implementation)

🤔 So what's the problem?

Now this all seems fine - the whole math thingy is based on proving stuff based on other stuff that was proven by .... and axioms are factually truth, but math has a formal way of proof!

There have been numerous instances of bullsh*t of companies invading the academic space with straight up lies for lobbying purposes.

The whole system doesn't work, because we are humans. We are biased and we make mistakes. And we all stand by different values.

⚠️ Is it really a problem?

Okay, but you can just read the paper and tell that it's wrong, right?

If you read a paper and find a mistake - congrats! But, what if you made a mistake? What if your addition is also biased in another way give your culture, religion, favourite breakfast cereal, etc. etc.?

It is almost impossible the objectively determine the truthiness of an academic paper because of these problems:

  1. Mistakes - unintentional
    • The paper itself or any of the sources might contain errors.
    • The error get's multiplied with every subsequent reference to the paper.
  2. Bias - semi-intentional
    • The work on the paper might be generally done with the intention to confirm a given hypothesis.
    • The author might find causality in cases where more factors are at play, but significant correlation happens.
    • A brilliant example of this phenomenon is this website by Tyler Vigen.
  3. Lies - intentional
    • The results of the paper are pre-determined and the paper serves just as the medium that connects the right dots.
    • This is the biggest problem as intentionally lying authors will focus on making the disproving of the paper as difficult as possible.

📝 Written text sucks at storing facts

In my opinion, the biggest issue here is the problem of using the sources themselves. If I use an equation that another research found, it's not hard to find if it's actually the equation from the source, but what about more abstract stuff?

Let's say that there is a quite lengthy paper about "how social media influences children" and I derive from it that "social media is bad for children and should be banned" ... is that correct?

If the original paper for example shows not only the negatives of social media and children, but also possible benefits and doesn't explicitly say that it's "bad and should be banned", but just points to some issues that might well be resolved by slight changes to how the social media works, I very incorrectly referenced another paper, because I introduced new conclusion.

It is almost impossible to verify these references, because in case of more abstract referencing, it leaves the room for personal interpretation and any middle-schooler can tell you that one piece of literature can have wildly different meanings based on the interpretation.

🔍 Proof of what I'm saying

Now this all is just me yapping with proof in terms of "trust me bro".

Do you want proof? A good way to prove my words is this experiment:

  1. Find an argument that has a good amount of supporters on both sides.
  2. Find genuine and good sources that support and those that oppose the argument.
  3. If successful, you found a contradiction.
  4. Bonus: Find the mistake!

🔐 Trust no-one

I think a good way to visualize this is how DNS work - specifically the chain of trust.

When you want to make sure that a DNS record was not modified (by a shady ISP for example) you can verify it's signature with the authority that gave out the signature.

But what if the DNS record with the verification key was also modified?

Well, you can check that the authority's verification key also wasn't modified with another authority ... yadi yadi yada, it bubbles all the way to the root authority that we all just agree to trust.

⛓️ What is the chain of trust in academic papers

Sourcing from a different piece of work implies that part of your work's factuality is outsourced to that piece of work, therefore creating chaining the "trust" of your work with other pieces of work.

🛡️ Is there any prevention system in place?

So far the only way to be certain that a paper is not complete garbage is the publisher.

There are multiple companies that publish academic papers in magazines, collections, or just their websites, and if you want your paper to be published there, you need to pay a hefty fee and they'll dig deep to check the correctness of the paper ... BUT!

Who is checking the paper? Based on what is the correctness evaluated? Do these people understand what my paper is describing? If I offer a very big sum of money, will they publish anything? ... and here we go again ...

🌅 Epilogue

I call for a better system that should be introduced. The design of that system goes beyond the scope of me yapping on the internet, but here are a few characteristics that the system should implement:

  1. A sourcing system that's made for the modern era, unifying the different systems in place.
  2. A global registry of all academic papers that is not behind a paywall. This registry will also handle reviews. (If the paper was proven wrong, it should be publicly stated so.)
  3. Prepare the system for multi-model fact checking that most likely will be introduced in a couple of years and will provide an objective way to analyze papers for truth.
  4. If a given paper is proved wrong, all papers that source it should be at least flagged for possibly being wrong as well and why.

Have you got any other ideas? Kindly let me know and let's open a discussion about it. :)

Thank you for reading and with hopes for a bright future,
Tomáš

Top comments (0)