Tech Lead/Team Lead. Senior WebDev.
Intermediate Grade on Computer Systems-
High Grade on Web Application Development-
MBA (+Marketing+HHRR).
Studied a bit of law, economics and design
Location
Spain
Education
Higher Level Education Certificate on Web Application Development
If you are going to check for copy-pasted things, algorithms for text-comparison would be ok, just extract the amount of differences between a paragraph and what you crawled to your DB and set a contingency percentile to spot something as plagiarism.
If you want better performance spot keywords to index, make an in-memory DB copy (if it suits your budget) and so on, pretty standard.
I remember reading something about that involving ML but that's a bit tricky because there's a limited amount of ways you can explain something.
I.e. There's a limited amount of synonyms for a word, there's a limited amount of meanings for an idiom, a word and so on.
Then each discipline has it's own flexibility; you can explain in more ways how gravity works than the amount of ways you can think of while explaining a cooking recipe step by step.
That's to say that training an AI to correctly spot plagiarism without reporting falsy positives is pretty hard.
Top comments (9)
If you are going to check for copy-pasted things, algorithms for text-comparison would be ok, just extract the amount of differences between a paragraph and what you crawled to your DB and set a contingency percentile to spot something as plagiarism.
If you want better performance spot keywords to index, make an in-memory DB copy (if it suits your budget) and so on, pretty standard.
I remember reading something about that involving ML but that's a bit tricky because there's a limited amount of ways you can explain something.
I.e. There's a limited amount of synonyms for a word, there's a limited amount of meanings for an idiom, a word and so on.
Then each discipline has it's own flexibility; you can explain in more ways how gravity works than the amount of ways you can think of while explaining a cooking recipe step by step.
That's to say that training an AI to correctly spot plagiarism without reporting falsy positives is pretty hard.
Hope it helps somehow
Thanks for the suggestion:)
Nah, it is a hobby project. Our school won't give anything interesting to doπ₯²
Thanks:)
I have that in a database based on many web pages
I start by learning Pascal, then write some code in C++, and to finish, I'd made something that works in JS.
Feel free to copy my comment.
How is this related to my question?