AI writing now includes AI paraphrasing detection!
Learn more

How does iThenticate work? Tools for advancing research integrity

Plagiarism is not just limited to the classroom. Join us as we walk through the core features and capabilities that make iThenticate an indispensable ally to the research community.

Laura Young
Laura Young
Content Marketing Specialist

Turnitin first rose to prominence as a novel way to combat the rising convenience of internet plagiarism in the 1990s. In the decades since, it has become a well-known integrity solution for educators and institutions worldwide.

However, in recognizing that plagiarism is not just limited to the classroom, iThenticate emerged—a specialized tool designed to uphold research excellence, including researchers, publishers, academic and business leaders, admissions officers, and government officials.

Moher et al. (2020) explain that “For knowledge to benefit research and society, it must be trustworthy.” iThenticate has been carefully designed to help researchers submit original manuscripts, and give publishers the confidence that they are issuing ethical papers. But how does iThenticate provide these types of reassurances? In this blog post, we’ll unveil the inner workings of iThenticate, delving into the distinct nuances that, since 2004, have made it an indispensable ally to the research community.

Does iThenticate check for plagiarism?

It’s a common misconception that the Similarity Report is a plagiarism detection tool, and we believe that setting the record straight among our education community is key to promoting fair and correct usage within the institutions that we serve.

The Similarity Report is, in and of itself, a piece of text-matching software that checks against the Turnitin database to reveal matches. If there are instances where a piece of writing is similar to—or matches against—a source in the database, it is flagged for review in the Similarity Report.

Similarity does not equal plagiarism. It is perfectly normal for some academic writing to match against the Turnitin database. Quotations and citations are generally acceptable matches; they illustrate findings and extend a second voice to a piece of work. It is important to investigate a source in the Similarity Report thoroughly to determine whether a match is or is not acceptable. Unacceptable matches are only distinguishable through human interpretation, and it’s encouraged that—rather than using it to form a full picture—the Similarity Report is used as a singular puzzle piece that contributes to a wider investigation.

How is the iThenticate Similarity Report used to deter research misconduct?

While researchers, publishers, and scholars may use iThenticate to satisfy a separate set of objectives, everyone involved in the process is focused on one end goal, advancing research integrity, and the iThenticate Similarity Report brings them one step closer to achieving this.

The Similarity Report is a flexible document, comprising a myriad of functionality, including the renowned similarity score, on-paper highlights and filtration. These features allow researchers and publishers to zero in on the source of matches and pinpoint any discrepancies in a manuscript.

In a survey into the multiple uses of iThenticate among 119 students and 26 supervisors, McCulloch, Behrend and Braithwaite (2021) found that “while for both groups iThenticate’s regulatory function in preventing plagiarism (whether international or accidental) was important, the system’s potential educational function in improving research writing capability and publication was equally important.”

Although a powerful tool built to help identify unoriginal or improperly-cited writing, constant refinement of the Similarity Report has seen it become a valuable tool for promoting academic integrity. In addition to its similarity detection features, the new Similarity Report, available in iThenticate 2.0, provides even further integrity insights—helping researchers not just identify plagiarism, but facilitate learning around what is and is not considered acceptable practice in research writing.At Turnitin, we believe that user experience is more than just a concept,it's a tangible feeling. We've made a concerted effort to transform every iThenticate interaction into a positive and seamless journey.

Identify potential AI-generated writing

In April 2023, Turnitin’s AI writing detection capabilities launched across many of our integrity solutions—a milestone in combating the improper use of AI writing tools, such as ChatGPT. The inclusion of AI writing detection in iThenticate gives researchers, publishers, and scholars the tools they need to protect themselves from this emerging form of misconduct as they embark on their research and publication journey.

Generative AI's complexity makes manual AI writing detection challenging. We aim to ease this burden for researchers and publishers, minimizing the effort associated with checking for AI-generated content. Turnitin’s AI writing detection tool, available within the iThenticate Similarity Report, can check papers en masse, enhancing efficiencies across the traditionally intensive publication process.

But AI writing detection does not stop at technology—it is simply one part of a whole. We must remember that “ we all glide into an artificially drafted future, it's clear that a human questioning mindset will be needed. Indeed, our investigative skills and critical thinking techniques could be in more demand than ever before” (O'Brien, 2023).

We urge our publishing partners to view an author’s AI writing score as an indicator that fits within a broader investigation. While an AI writing detection tool highlights the potential use of AI, it can't offer a conclusive judgment. We advocate for prioritizing human interpretation when looking for AI-generated content in an author's paper, accounting for factors like false positives, intent, and the author's known capabilities. iThenticate 2.0 introduces a revitalized and contemporary interface, crafted with our research community in mind, and enabling effortless navigation from the initial onboarding stage through to the final similarity check. iThenticate’s new look is consistent and deliberate to ensure that even its newest users can get off to the best start. We take great pride in sharing that iThenticate 2.0 meets accessibility standards, thus increasing inclusivity among our education community.

Surface text manipulations to highlight misconduct

The Similarity Report now features the Flags Panel, which highlights text manipulations in a manuscript, such as replaced or hidden characters. Turnitin’s algorithms look deeply at a document for any inconsistencies that would set it apart from a normal submission, and if we notice something strange, we flag it for review. In some cases, a flag is not necessarily an indicator of a problem; however, it’s where we suggest focusing attention for further review.

Hidden or replaced characters may signal a deliberate attempt to exploit exclusion mechanisms or interrupt a similarity match within the Similarity Report. These forms of match evasion are usually deemed as being more calculated than a simple missing quote or citation and can help publishers determine intentionality when reviewing submitted manuscripts for publication.

Categorize matches to determine intentionality

Earlier, we discussed self-plagiarism and the importance of citing previously published work. Recycling your own work is generally acceptable if citations and quotations are used accordingly. But what if this detail is overlooked?

Unintentional plagiarism lends itself well to developmental opportunities at all stages of education—even postgraduate level. But data and insights have long been difficult to gather, making it a challenge for publishers to determine intentionality when it comes to academic misconduct.

The iThenticate Similarity Report makes it easier to draw the line between citation mistakes and deliberate omissions. Authors, publishers, and scholars can now see matches with common characteristics grouped into four categories, according to the extent that an author has cited or quoted throughout their manuscript:

  • Not Cited or Quoted: Text matches are not quoted, or the original source is not cited. These matches could suggest plagiarism and require further investigation.
  • Missing Quotations: Text matches are cited, but the match is so exact that it may also require quotation marks.
  • Missing Citation: Text matches are quoted, but the original source is not cited.
  • Cited and Quoted: Text matches are quoted and cited to a source.

In an age where there is an abundance of information available to quote and cite, match categorization makes interpreting and sorting through matches easier than ever, helping to quickly discern between intent, areas for improvement, and writing success.

We understand the concerns of our research community when it comes to data compliance, and we’re incredibly proud to share that all data transferred to us is stored on a highly secure AWS data platform. With data centers in the United States, Europe, and across the Asia-Pacific region, this assures Turnitin customers that all personal data uploaded to iThenticate 2.0 is safe, protected and being stored and processed according to the world's highest standards.

Our updated technology lays a robust foundation for delivering functionality at a faster pace. We’re confident that with state-of-the-art technology, we can now provide the latest and most advanced tools to help institutions innovate at speed; we see this as a step towards improving the effectiveness of all involved in the publishing journey.

Dynamically exclude matches to refine the similarity score

Exclusion capabilities in iThenticate are highly regarded, as researchers, publishers, and scholars seek to check only text submitted as original writing against the Turnitin database. iThenticate offers an array of exclusion capabilities that allow full refinement of the Similarity Report and a more accurate similarity score. The following items can be excluded from the Similarity Report of a submitted manuscript:

  • Bibliographic content, identified using a set of beginning and terminating phrases.
  • Quotes, identified between specific types of quotation marks.
  • Citations, both in-line and in reference sections.
  • Small matches, defined by number or percentage.
  • Preprints, with the opportunity to manually add additional preprint repositories.
  • Specific content databases, including internet, publication, and submitted works, plus private repositories.

By using iThenticate’s settings to dynamically discount various types of material from the Similarity Report, researchers can be confident that each paper's similarity score relies solely on the content submitted as original writing. This step also contributes to creating a level playing field for researchers and scholars submitting different types of work, such as qualitative vs. quantitative analyses.

How does iThenticate highlight collusion between authors?

Having access to highly relevant institutional or industry content is crucial for organizations to ensure that high-stakes work is thoroughly checked for potential plagiarism before manuscript submission. This includes collusion within an academic institution or among published researchers.

“The ubiquity of the Internet, the ever-intensifying demand to publish or perish, and maybe, a general shift in perceptions of what constitutes ‘bad’ plagiarism and collusion … mean that the time may be ripe for a consideration by academic writers and journal editors of how they regard and deal with the whole area” (Sikes, 2009).

We want to help publishers protect their reputations and publish submitted manuscripts with confidence. Now included as standard in an iThenticate 2.0 license, institutions can offer exclusive use of a private storage repository to their authors.

Whilst having access to Turnitin’s extensive content database supports plagiarism detection in iThenticate, being able to check all previously submitted works across all the journals they work on gives publishers the means to spot collusion between authors.

Checking against a private repository allows for a completely concentrated Similarity Report, centered around only the papers in a publisher’s repository, and without dilution from external sources.

How can iThenticate help to mitigate self-plagiarism in both published and unpublished works?

While the most clear-cut form of plagiarism is using someone else’s work as your own, self-plagiarism (or text-recycling) can be just as threatening to the values of research integrity. Self-plagiarism pertains to authors recycling their own, previously published content and presenting it as a "fresh" creation without disclosing this via acknowledgement, such as a citation.

The Office of Research Integrity raises a thought-provoking question: “Given that plagiarism is often conceptualized as theft, the notion of self-plagiarism does not seem to make much sense. After all, is it possible to steal from oneself?” So, why is self-plagiarism unethical? Although presumed as a relatively harmless form of plagiarism, “it is an intentional attempt to deceive a reader by implying that new information is being presented” (Bonnell et al., 2012).

Regardless of intentionality, self-plagiarism can have a long-lasting impact on reputation and impact factor within the research community. Journal editors may even consider publishing a retraction article if they identify significant overlap between publications by the same author(s). Copyright infringement can come into play if permission to reuse content has not been granted by publishers.

How does the doc-to-doc comparison tool work in iThenticate?

At Turnitin, we see huge benefits in plagiarism deterrence and mitigation over plagiarism checking. iThenticate gives research authors the tools they need to put their best foot forward. Before submitting a manuscript to a publisher, authors can use the iThenticate doc-to-doc comparison tool to avoid the risk of content duplication.

The doc-to-doc comparison feature in iThenticate allows authors to compare a subset of their unpublished writing against a soon-to-be-submitted manuscript to generate a comparison report. This gives authors the confidence that any manuscripts still in flux will not trigger a Similarity Report match when they’re submitted for future publication.

Checking against unpublished works can also be beneficial if, as a researcher, you plan on submitting more than one manuscript with similar methodologies and motivations. Although your manuscripts may have widely differing content and focus, this reused content still could be flagged in the iThenticate Similarity Report. Of course, if you believe that reusing already-published text is ethically justifiable, it is important to seek the agreement of all involved parties and make clear acknowledgements to the old publication in your new publication (Israel, 2019).

The guidelines around methodology and protocol reuse are actively changing. Speaking to Cathleen O’Grady, Editor-in-Chief for medical journal, Anesthesiology, Evan Kharasch, notes that the journal now allows methods that are identical or substantially similar to prior publications, as long as authors cite their original paper. "It seemed appropriate to enable people to use their best description of what they had done."

Doc-to-doc comparison also helps publishers confirm that all co-authors’ contributions have been attributed appropriately before publication.

Conclusion: Using iThenticate to advance research integrity

In an age where knowledge sharing knows no bounds, Turnitin stands in partnership with the research community, reinforcing the values of honesty, transparency, rigor, respect, and accountability, as defined by the UK Research Integrity Office.

With the support of iThenticate and its thoughtfully designed suite of features, researchers, publishers, and scholars are equipped to champion research integrity and ensure that all involved in the publication process are unwaveringly responsible in their approach to the dissemination of information.