Address the originality of student work and emerging trends in misconduct with this comprehensive solution.
Deliver and grade paper-based assessments from anywhere using this modern assessment platform.
This high-stakes plagiarism checking tool is the gold standard for academic researchers and publishers.
This robust, comprehensive plagiarism checker fits seamlessly into existing workflows.
Give feedback and grade assignments with this tool that fosters writing excellence and academic integrity.
Improve program outcomes with instant data insights from secure digital exams taken offline.
Uphold academic integrity, streamline grading and feedback, and protect your reputation with these tools.
Improve student writing, check for text similarity, and help develop original thinking skills with these tools for teachers.
Publish with confidence using the tool top researchers and publishers trust to ensure the originality of scholarly works.
Discover the Turnitin Partner Program that offers flexible solutions for integration and commercial partnerships.
Get inspired by educators who are transforming assessment into meaningful learning while maintaining integrity at its core.
Follow our progress on detection initiatives for AI writing, ChatGPT, and AI-paraphrasing
The Similarity Report provides a summary of matching or similar areas of text found in a submitted paper. This...
With large classes to support, it isn't surprising that some Similarity Report options remain undiscovered by our...
Support students' understanding of plagiarism with this QuickMark set for Turnitin Feedback Studio. They are...
Turnitin blog posts, delivered straight to your inbox.
There are a number of ways that technology can be used to identify potentially plagiarized content. This post examines the different ways, and how Turnitin uses search technology and content comparison algorithms to help educators help students learn how to use source attribution appropriately.
Plagiarism has always existed as a problem - the origins of the word date back to the 1st century. It's only of late, however, that plagiarism has become a significant concern not just for educators and researchers, but also in the public sphere. New instances of plagiarism seem to hit the news on a daily basis. Whether it's song lyrics, plagiarism by school officials, government ministers, speeches by political figures, or the plagiarism that happens in the classroom, incidents of plagiarism appear to be on the rise everywhere.
We have the internet to thank for that. With the rise of the internet, we've seen exponential growth of content created and made readily available, almost everywhere. The growth is happening on such a large scale that we don't even have a way to grasp how huge of a change in content creation we're witnessing. In 2013, factshunt.com pegged the amount of total internet content at 14.3 trillion pages (article). The growth is happening so fast, that we don't have a way to accurately determine the number of new pages created each day or the total amount of content that currently exists online. The best estimates suggest there are 47 billion indexed and searchable web pages (article). To put this number into perspective, it would take approximately 300 trillion sheets of paper to print out the entire internet, today.
With all of this information so immediately accessible, is it any surprise that we've seen a rise in plagiarism as well? Fortunately, the growth of the internet and our need to find ways to search that content has led to developments in web crawling and indexing technology (the latter of which is used to identify the content that is crawled) that has led to technology that quickly identifies copying and the potential plagiarism of content.
First off, it is important to clarify that plagiarism detection software doesn't specifically identify plagiarism. No software will ever be able to accurately determine intent. And intent is one of the factors that educators consider when looking at incidents of plagiarism in student work. The way that plagiarism detection software works is to identify content similarity matches. That is, the software scans a database of crawled content and identifies the text components and then compares it to the components, or content, of other work. Based on that comparison, the software will generate a report that highlights the content matches. Plagiarism detection software crawls and indexes content very similarly to the ways that search engines, like Google, crawl and index web content. The key difference here is that plagiarism detection software is crawling and indexing content not to make it keyword searchable, but to identify similar content stored in the database of crawled pages.
There are, generally speaking, four different ways to go about doing this. The first way is through keyword analysis. What does that mean? Like a search engine, you enter in a keyword and the software scans documents to find instances of that word. Another way to scan text for similarity is to look at groups, or strings, of words. Rather than looking just at individual words, the software looks for strings or sequences of words (say 3-4 or more words ordered in such a way to create a sentence or sentence fragments). As you may be able to see already, these two approaches can be pretty effective for identifying the strict or exact copying of content within one document to others. The shortcoming of these approaches, however, is that it doesn't identify paraphrasing--where the ideas and meaning may have been copied--but the text is different enough that it doesn't get identified as a match.
A way to better get at this type of problem would be through a third way, which is to go about scanning for content matches by looking at the style of the writing and to compare that style to other documents. This is not a strict word-to-word analysis, but more of an approach that takes a look at the probability of certain word sequences ("phrases") that may appear in one document and then compare it to other documents. The challenge here is fine grained, word-for-word matches can get lost. Better yet, why not identify a document's unique "fingerprint," and then compare that fingerprint to others? This last approach, that we will discuss in this blog post, is what we largely do at Turnitin.
With "fingerprinting," Turnitin's technology scans and identifies the unique fragments and the ordering of word fragments that appear in a document. With this level of analysis, we can uncover word string matches ("fragments"), but also look at the unique sequences of those matches to create a fingerprint of the document.
Fingerprints are entirely unique and can be identified by the specific features displayed in a print. The same thing can be said for documents, each document has unique features such as phrasing, tone, style that if completely original is like a fingerprint, unique. If a document contains content that is unoriginal in its phrasing, the document will match to other document fingerprints that also contain this feature.
One issue this approach faces is how to avoid picking up very common words--like articles ("the," "an," "a") or conjunctions ("and," "but," "of")--and hone in on the strings of words that make a document unique. Fingerprinting gives us a way to exclude commonly-used words, while providing us with the ability to identify when content is poorly paraphrased. Because Turnitin was developed for use in academic contexts, our approach with fingerprinting is to focus on features of the text that are clearly relevant to the content or subject matter of the document.
For example, if you're finalizing your dissertation thesis you would want to make sure that all of the ideas you discussed were properly referenced and cited and that sections where you paraphrased were paraphrased properly. What gets less emphasized is strict word-to-word matches. If a more keyword search-biased approach were used here, we'd be unable to identify poor paraphrasing or selective word substitution--which incidentally is the majority of what academic and educators see in student work.
If you're looking for strict word-to-word matches, you could use a search engine (which is what everyone did before the advent of plagiarism detection software). If you're looking at comparing one author's style to another, there's an approach for that. As for identifying content matches in academic-type writing, Turnitin has developed a fingerprint-based approach that excels at finding the content that matters, when it matters. In other words, Turnitin is designed to support students who are learning how to use the internet to do research, use source materials, and take ownership of their own writing and ideas.