DEV Community

SaptakBhoumik
SaptakBhoumik

Posted on

How do I implement a plagiarism detector completely from scratch and it has to be fast as well?

Top comments (9)

Collapse
 
joelbonetr profile image
JoelBonetR πŸ₯‡ • Edited

If you are going to check for copy-pasted things, algorithms for text-comparison would be ok, just extract the amount of differences between a paragraph and what you crawled to your DB and set a contingency percentile to spot something as plagiarism.
If you want better performance spot keywords to index, make an in-memory DB copy (if it suits your budget) and so on, pretty standard.

I remember reading something about that involving ML but that's a bit tricky because there's a limited amount of ways you can explain something.

I.e. There's a limited amount of synonyms for a word, there's a limited amount of meanings for an idiom, a word and so on.

Then each discipline has it's own flexibility; you can explain in more ways how gravity works than the amount of ways you can think of while explaining a cooking recipe step by step.

That's to say that training an AI to correctly spot plagiarism without reporting falsy positives is pretty hard.

Hope it helps somehow

Collapse
 
saptakbhoumik profile image
SaptakBhoumik

Thanks for the suggestion:)

Collapse
 
saptakbhoumik profile image
SaptakBhoumik

Nah, it is a hobby project. Our school won't give anything interesting to doπŸ₯²

 
saptakbhoumik profile image
SaptakBhoumik

Thanks:)

Collapse
 
saptakbhoumik profile image
SaptakBhoumik • Edited

I have that in a database based on many web pages

Collapse
 
tonyknibbmakarahealth profile image
TonyTheTonyToneTone

I start by learning Pascal, then write some code in C++, and to finish, I'd made something that works in JS.

Feel free to copy my comment.

Collapse
 
saptakbhoumik profile image
SaptakBhoumik

How is this related to my question?