campaign
Turnitin launches iThenticate 2.0 to help maintain integrity of high stakes content with AI writing detection
Learn more
cancel
Blog

The new Turnitin Similarity Report: Updated integrity features

Navigate the end-to-end experience of the Similarity Report, learn what the report is (and isn’t), and how you can leverage this constantly-evolving tool to keep integrity at the core of all student submissions.

Laura Young
Laura Young
Content Marketing Specialist

What do you know about the Turnitin Similarity Report and how it works?

Stemming from a period as ancient as when words and artistic expressions were etched onto clay and stone, it is safe to assume that plagiarism has been a long-time societal issue. Yet, now that writing has progressed into the digital sphere, the practice of taking someone’s work and using it as one’s own has boomed—but so too has the technology that can support its detection and deterrence.

In 2000, Turnitin.com launched in its most basic form, leveraging database pattern-matching technology developed from Berkeley students’ doctoral research. Originally designed to detect pre-internet “frat file” plagiarism, the Turnitin Similarity Report was later adapted to deal with internet plagiarism following the ease at which technology made—and continues to make—plagiarism more accessible.

Fast-forward over twenty years, the Similarity Report is no longer just about plagiarism.

Join us as we navigate the end-to-end experience of the Similarity Report—learn what the report is (and isn’t), and how you can leverage this constantly-evolving tool to keep integrity at the core of all student submissions.

What is the Turnitin Similarity Report?

A typical submission made to an assignment in Turnitin generates a Similarity Report. The Similarity Report is a flexible document, providing a comparison of student writing against an unparalleled repository of over 99 billion web pages, 1.8 billion student papers, and 89.4 million subscription articles that contribute to the best-in-class academic content across major disciplines.

The Similarity Report comprises a myriad of functionality, including the renowned similarity score—the percentage of content that matches the Turnitin databases—as well as on-paper highlights, filtration, and flag insights. These features allow instructors and administrators to zero in on the source of matches and pinpoint any discrepancies in a student’s writing, such as replaced characters or hidden text.

Although a powerful tool built to help educators identify unoriginal or improperly-cited student writing, constant refinement of the Similarity Report has seen it become a valuable tool for promoting academic integrity—helping students understand proper citation practices and assisting educators in identifying potential instances of plagiarism or improper sourcing in submitted works. In a study centered on developing student academic writing using ‘plagiarism detection programs’, Li et al. (2021) found that tools such as the Turnitin Similarity Report, “encouraged students to develop their writing … and resulted in enhanced paraphrasing skills when paired with explicit and focused instruction.”

Today, the Similarity Report gives institutions the means to promote formative assessment. For example, the Similarity Report gives students the ability to resubmit, complete a full source analysis, and offers access to Turnitin Draft Coach—all providing the opportunity to hone their academic writing skills before final submission.

By using the Similarity Report judiciously, students can understand academic integrity from the outset of their university career, educators can gather insight into potential skills and knowledge gaps, and the institution as a whole can remain confident that their degrees hold weight in the outside world.

Does the Similarity Report check for plagiarism?

It’s a common misconception that the Turnitin Similarity Report is a plagiarism detection tool, and we believe that setting the record straight among our education community is key to promoting fair and correct usage within the institutions that we serve.

In short, Turnitin cannot offer an airtight assessment of whether a paper includes plagiarized material, since it can only detect similarities in submitted text.

The Similarity Report is, in and of itself, a piece of text-matching software that checks against the Turnitin database to reveal matches. If there are instances where a student’s writing is similar to—or matches against—a source in the Turnitin database, it is flagged for review in the Similarity Report. Instructors and administrators can then dig deeper into the source to determine whether it is or is not an acceptable match.

It is perfectly normal for some student writing to match against the Turnitin database. Quotations and citations are generally acceptable matches; they illustrate findings and extend a second voice to a piece of work.

However, unacceptable matches are only distinguishable through human interpretation, and it’s encouraged that—rather than using it to form a full picture—institutions use the Similarity Report as a singular puzzle piece that contributes to a wider investigation.

Daoud et al. (2019) observed that use of the Turnitin Similarity Report during assessment resulted in a reduction in plagiarism among 53 students, when compared to an assignment completed outside of Turnitin. Their student participant group also indicated that, “the use of the [Turnitin] software should be accompanied by hands-on education on the best practices of academic integrity and research writing by their professors.”

How can educators use the Similarity Report as an investigative tool?

Possibly one of the Similarity Report’s most-talked-about features is the similarity score. It has long played a central role in many discussions following the submission of an assignment. Is this similarity score acceptable? Does this similarity score suggest plagiarism? What does the similarity score mean? Simply put, the similarity score reveals the percentage of a paper's content that matches Turnitin's database.

And while many institutions may have assigned an acceptable threshold for the similarity score, contrary to popular belief, there is no ideal score; the similarity score alone is not sufficient to make an informed next step. Contextual details are key to substantiating a similarity score, or refuting it completely.

In an attempt to ‘decode the myth about Turnitin’, Mphahlele and McKenna (2019) assert that, “if Turnitin is primarily used as a policing tool, students are not only denied access to nuanced pedagogical interventions that might develop their academic writing, but its misuse could also change students’ behavior in undesirable ways.”

Gathering a collection of evidence helps educators understand the issue at hand. Considering the following contextual information that—at both educator or institutional level—can support an academic misconduct investigation.

Writing genre can affect the similarity score

A student submitting a quantitative research analysis may generate a wildly different similarity score compared to a student submitting a qualitative analysis. This is likely due to a large disparity between the volume of quotes and citations included in students’ retrospective papers. Matching text does not equate to plagiarism if a student has cited and quoted proportionately and correctly.

Assignment length may influence the similarity score

Submitting a document of considerable size could result in a 0% similarity score with a Similarity Report that still contains questionable matches. This is because the similarity score is rounded to 0%, rather than being exactly 0%.

Level of mastery may be unveiled by the similarity score

By assessing the Similarity Report as a whole, you may spot a large over reliance on direct quotes throughout a paper; this can reveal a lack of understanding and absence of original ideas. Although the student has cited correctly—ruling out misconduct—a large number of quotes could be grounds for a bigger conversation around subject knowledge. This leaves scope to teach students about effective quoting, paraphrasing and summarizing in order to reduce the amount of matching text.

Draft submissions may affect the score of final work

In normal circumstances, Turnitin’s self-exclusion filter will discount all previous draft submissions to the same assignment, but technical issues can come into play. If a student submits multiple draft papers as a non-enrolled student, their own papers may match against one another, causing a higher-than-expected similarity score.

Flags indicate intention in the similarity score

One of the Similarity Report’s newer functionality is its ability to identify replaced characters or hidden words. This data can help to determine intentionality when investigating potential misconduct. For many educators, the degree of intention makes a big difference when deciding whether to respond punitively or with developmental feedback in mind.

Learn more about a score from students themselves

Outside the realm of the Similarity Report, an important source of data is the student themselves. Can a conversation with the student glean insight into skills deficits or possible extenuating circumstances that may have contributed to their similarity score?

As the Similarity Report only checks for text matches, a low similarity score can also present other concerns. The content may be original, but there is a chance that the student submitting the paper may not be the original author, bringing the possibility of contract cheating or AI writing into play. These forms of misconduct can push institutions to reevaluate their mechanisms for proof of learning, and how they measure originality and critical thinking skills.

Suffice to say, although the similarity score is a core component of the Similarity Report, many other factors should be considered when determining what is an acceptable score and what is not. A high similarity score does not always mean that a piece of writing has been plagiarized, just as a low similarity score cannot always rule out academic misconduct.

What features does the new Similarity Report offer?

Elevating our commitment to innovation to new heights, Turnitin is excited to introduce the enhanced experience of the Similarity Report. With a long-awaited new and intuitive interface, the new Similarity Report is set to offer an elevated user experience for all Turnitin user types.

The new Similarity Report offers a cohesive experience

Our new tab navigation allows instructors and administrators to easily access different report functionality, such as AI-writing detection, similarity matching, and the Flags Panel, and switch between them seamlessly.

The new Similarity Report provides actionable insights supporting formative learning

A key component of the new Similarity Report is its new source cards, able to present a wealth of information about a highlighted match and its source material. Educators and students can extract valuable information from source cards, such as the percentage of matched text, and the number of matched text blocks and words. For educators, these source cards are a meaningful asset that offer context-specific feedback to students, guiding them towards better writing practices.

Simplified settings in the new Similarity Report increases ease of use

The new filters panel provides the option to exclude irrelevant matches and modify database settings, allowing all user types to customize individual reports with ease. This flexibility empowers educators to tailor each report according to their investigation, making a more informed conclusion around skills gaps, intent, and student success.

The new Similarity Report has a user-friendly, accessible design

The new Similarity Report boasts a streamlined layout, optimized for readability and simplicity. This deliberate approach to design saves educators time and awards efficiency when interpreting report results. With its enhancements, the new Similarity Report also meets the latest accessibility standards, ensuring an inclusive experience for all user types in Turnitin.

The new Similarity Report is a result of accelerated innovation

Our updated technology lays a robust foundation for delivering functionality at a faster pace. We’re confident that we can now provide institutions with the latest and most advanced tools to help educators innovate. We see this as a step towards improving the effectiveness of the assessment journey and making academic integrity a core value among students.

Does the new Similarity Report address AI writing detection?

In April 2023, Turnitin’s AI-writing detection capabilities launched across many of our integrity solutions—a milestone in combating the improper use of AI writing tools, such as ChatGPT.

As a first step towards accessing this new AI Writing Report, we asked you to open a separate window and leave the Similarity Report experience, requiring you to change how you worked. But with its updated technology, the new Similarity Report provides a new foundation to deliver AI writing detection, at a faster pace.

We are proud to now have the means to provide institutions with the latest and most advanced tools to support their teaching and assessment needs, hosting a fully integrated experience that gathers similarity, flag insights, and AI writing detection tools, and brings them into one cohesive workflow.

How does the new Similarity Report support feedback?

Unintentional plagiarism lends itself well to developmental opportunities in the classroom; but to be able to determine intentionality, educators require data and insights that have long been difficult to gather and make sense of.

Turnitin’s new Similarity Report has been thoughtfully redesigned with a new intuitive interface and match categorization panel, making it easier to interpret and use as a formative assessment tool that strengthens academic writing skills. Educators can now see matches with common characteristics grouped into four categories, according to the extent that a student has cited or quoted throughout their paper:

  • Not Cited or Quoted: Text matches are not quoted, or the original source is not cited. These matches could suggest plagiarism and require further investigation.
  • Missing Quotations: Text matches are cited, but the match is so exact that it may also require quotation marks. These matches may be an opportunity to provide formative feedback on how to properly cite and attribute sources.
  • Missing Citation: Text matches are quoted, but the original source is not cited. These matches may be an opportunity to provide formative feedback on how to properly cite and attribute sources.
  • Cited and Quoted: Text matches are quoted and cited to a source. These matches are a great opportunity to spotlight student strengths.

In an age where there is an abundance of information available to quote and cite, the new Similarity Report can teach students the value of ethical writing. Match categorization makes interpreting and sorting through matches easier than ever. Educators are now able to quickly discern between intent, teachable moments, and student success, freeing up more time for crafting meaningful feedback that addresses students’ unique needs.

As of 2020, the Similarity Report has also featured the Flags Panel, highlighting text manipulations in a student paper, such as replaced or hidden characters. Whilst these forms of match evasion may seem much more deliberate than a missing quote or citation, they can help educators to evaluate student understanding and skills development needs, whilst also highlighting the possibility that a student is struggling and requires extra support.

  • If students are given continuous feedback, plus the opportunity to revise and resubmit, they can practice making judgments on the information and media that they choose to cite in their papers. Turnitin continues to design a Similarity Report that encourages students to incorporate legitimate information and media into their writing through proper attribution.

How does Turnitin generate a Similarity Report?

When a student submits a paper to Turnitin, this triggers an action in the backend that, in seconds, delivers a fully comprehensive Similarity Report. Every second, Turnitin generates twenty Similarity Reports, and on the busiest days, the system can receive more than one million submissions. But how does it work?

To generate a Similarity Report, Turnitin takes a paper and breaks down its words into phrases, whilst discounting any common words such as “and” “or” and “the.” To identify similar content, a unique ID is assigned to each phrase, then compared to seven trillion possible phrase matches in the Turnitin database. If Turnitin identifies a potential match, it applies natural language processing and strict matching to limit false positives and generate the most accurate report.

In tandem, the Similarity Report algorithm can look deeply at a document for any inconsistencies that would set it apart from a normal submission. It flags anything it deems as ‘strange’ for further review.

But the Turnitin algorithm can only go so far. Although a Similarity Report can be generated based on Turnitin’s default settings, a refined report relies heavily on the configuration of a Turnitin assignment before students start submitting.

During the setup of a Turnitin assignment, instructors and administrators are met with an array of advanced Similarity Report settings, which grants the freedom to define a set of parameters and ensure accuracy in every similarity score.

Are all papers that receive a Similarity Report checked against the Turnitin database?

Turnitin has three primary databases, offering institutions comprehensive coverage across the internet, scholarly articles and student papers. Turnitin utilizes these databases to help identify different types of plagiarism in a student paper.

  • The student paper database is an archive of student papers from around the world, spanning over twenty years. This database aims to discourage and help to identify cases of student collusion, regardless of institution, country, language, or time of study.
  • Turnitin’s database of current and archived internet content uses a proprietary crawler to target the websites most likely used by students and researchers. It can compare matches against individual internet sources to highlight potential copy/paste plagiarism.
  • With access to a collection of the top scholarly content, across all disciplines, delivered directly from publishers and Open Access repositories, researchers and students can compare their original work against published works from around the world.

Whilst completely customizable, it’s recommended that institutions enable all search repositories to generate an all-inclusive Similarity Report that leaves no stone unturned.

Does the Similarity Report exclude bibliographic material?

Turnitin's machine learning algorithm can identify and subsequently exclude the areas that constitute a bibliography or quote in a student’s paper. This is an advanced setting that instructors and administrators can enable at assignment setup. It is deactivated by default, and therefore, commonly overlooked.

By enabling this content exclusion setting, the Similarity Report becomes more than just a tool for spotting similar content; it morphs into a learning opportunity for both educator and student. Opting to exclude bibliographic material facilitates a higher level of focus on student citation errors, allowing educators to leverage the Similarity Report as a formative tool that enhances students’ referencing skills.

By dynamically discounting bibliographic material from the Similarity Report, educators can gain confidence that each paper's similarity score relies solely on the content submitted as original writing. This step contributes to creating a level playing field for students submitting different types of work, such as qualitative vs. quantitative analyses.

How does the Similarity Report identify student collusion?

The effect of culture on tolerance levels in academic integrity can vary widely from country to country. Some cultures are much more collectivist in their approach to academia, and may struggle to understand the implications associated with unauthorized collaboration or student collusion.

More generally, Sutton, Taylor and Johnston (2012) find that “students consider plagiarism related to group work to be far less serious than other types,” and this lack of understanding can lead to accidental plagiarism. So, how can you use the Turnitin Similarity Report to help your students understand the issues around collusion and turn it into a teachable moment?

  • Opt to search across all available repositories.
  • Opt to store student papers in all available repositories.
  • Tread cautiously when excluding small sources. By opting to exclude a high number of words, this could inadvertently hide student collusion. Think carefully about the source length you consider to be truly insufficient before enabling this setting.

There are several ways in which students can collude. Students may work together on a group project, but fail to realize that their end-of-project assignment requires independent output. Submitting the same or similar work could be perceived as unintentional student collusion. In other circumstances, a student may copy the work of another student and submit it as their own, signaling potential misconduct.

However, on occasion, educators may run into nuances where self-plagiarism can initially present itself as collusion, only to dig deeper and realize that the same student has authored two matching submissions.

Many students believe that reusing their own work isn’t plagiarism, and you can use this Similarity Report discovery to increase your students’ understanding of self-plagiarism, and how it can put their academic integrity in jeopardy.

Although self-plagiarism mightn’t be a particularly prevalent issue in your institution, its existence alone highlights the importance of using the Similarity Report as an investigative tool when looking for student collusion.

Conclusion: How the new Similarity Report supports integrity and feedback

A full understanding into the multi-faceted role of the Similarity Report has promise to bring new meaning to the way that institutions choose to adopt it for teaching, assessment, and the plagiarism investigation process.

As teaching and learning practices develop and innovate over time, educators are actively discovering new ways to manage their time and effort. The new Similarity Report’s Match Groups feature is set to play a critical role in surfacing formative feedback opportunities around citing and referencing. Educators can identify and ultimately bridge student skills gaps to reduce the risk of unintentional plagiarism.

With an increased number of data insights now readily available to students, instructors, and administrators, the illustrious similarity score can assume a more subsidiary role in determining intent, reinforcing our position as a similarity checker that supports—rather than defines—an inquiry into academic misconduct.

Turnitin’s upgraded technology gives rise to deliver Similarity Report functionality at a faster pace, when educators need it most, thus paving a new way for how institutions around the world approach academic integrity and manage the threat of academic misconduct.