GPTZero: how to detect ChatGPT plagiarism

In terms of world-changing technologies, ChatGPT has truly made a massive impact on the way people think about writing and coding in the short time that it’s been available.

However, this ability has come with a significant downside, particularly in education, where students are tempted to use ChatGPT for their own papers or exams. That brand of plagiarism prevents students from learning as much as they could and has given teachers a whole new headache: how to detect AI use.

Teachers and other users are now looking for ways to detect the use of ChatGPT in students’ work, and many are turning to tools like GPTZero, a ChatGPT detection tool built by Princeton University student Edward Tian. The software is available to everyone, so if you want to try it out and see the chances that a particular piece of text was written using ChatGPT, here’s how you can do that.

What is GPTZero?

A MidJourney rendering of a student and his robot friend in front of a blackboard. — MidJourney

GPTZero is a web app and service designed to detect whether a body of text has been written by a human or by an artificial intelligence. Currently the system can ostensibly detect the outputs from a variety of large language models including ChatGPT, GPT-4, and Claude, as well as if it was written by a human in collaboration with an AI.

It was developed and initially released in January 2023 by Edward Tian, a 22-year-old undergraduate studying computer science at Princeton University and a former software engineering intern at Microsoft. While announcing the platform on X (formerly Twitter), Tian noted that the analysis was based on the research of Princeton Ph.D. candidate Sreejan Kumar, and the work of Princeton’s Natural Language Processing Group.

the analysis is based on some ongoing research with and @sreejan_kumar and @princeton_nlp. hopefully we’ll publish something empirical soon. but in the mean time this was a fun app to make 🙂

— Edward Tian (@edward_the6) January 3, 2023

Is GPTZero free?

GPTZero was designed for educators, but anyone can use it for free. With a free account, you can scan 40 documents per hour and access the GPTZero dashboard. The $10/month Essential plan will scan up to 150,000 words per month, and grant access to “premium” AI detection models as well as “plagiarism scanning” and “Advanced Grammar and Writing” feedback. The $16/month Premium package increases the word count to 300,000 per month and offers an “Advanced AI Deep Scan” and multilingual AI detection in addition to the Pro-level benefits. The top-tier $16/month Professional subscription doles out 500,000 words per month with another 10 million words “in overage.” That’s a lot of perceived plagiarism.

Is GPTZero accurate?

While GPTZero touts its service as highly capable, some users have found the the service’s accuracy “to be inconsistent, often mislabeling human-written text as AI-generated and struggling with certain types of generated text.” Following a suggestion from Reddit user Smellz_Of_Elderberry, I asked ChatGPT to write a brief story about the book The Old Man and the Sea as if it were a high school student. GPTZero wasn’t fooled.

ChatGPT writing as if its a high school student — Image used with permission by copyright holder

I tried again, altering the text with some misplaced punctuation, incorrect verb tense, and other small errors, but GPTZero still stated, “your text is likely to be written entirely by AI.”

The scan correctly guessed a passage’s AI origins even when using text generators other than Claude or GPT-4. I had Gemini 1.5 Pro write a separate report on The Old Man and the Sea but GPTZero caught that as well.

The accuracy of GPTZero is still being assessed, but based on these anecdotal tests, it seems to be working well.

If you use GPTZero, it’s important to bear in mind that errors are possible. When using GPTZero to detect AI or ChatGPT to help write a document, you still need to check the work for mistakes.

How does GPTZero work?

GTPZero's AI text assessment includes statistics of perplexity and burstiness. — Image used with permission by copyright holder

GPTZero analyzes the randomness of text, known as perplexity, and the uniformity of this randomness within the text, which is called burstiness in statistics. An AI is very consistent in its perplexity and burstiness, while human writers vary those characteristics without any awareness.

The work isn’t done, and Tian notes that more tests will be added to improve the accuracy of AI text detection. In particular, implicit bias is an area being explored as another way to detect if the text is generated by an AI.

we’re still studying implicit bias in LM generated text right now, so hopefully will be adding a few more tests and factors to improve the model

— Edward Tian (@edward_the6) January 3, 2023

How can I use GPTZero?

GPTZero is available on its website. Simply copy the text you’d like to check and paste it into the big box labeled Try it out.

GPTZero's website is quite simple with a text box and a submit button. — Image used with permission by copyright holder

It’s also possible to upload a PDF, Word document, or text file and click the Get Results button. You’ll also need to check the box signifying that you agree to the terms of service.

Alternatives to GPTZero

GPTZero isn’t the only AI-powered plagiarism detector on the market today. OpenAI offers its GPT-2 Output Detector and has reportedly developed an updated version, though there is no word on when or if it will be released. Content at Scale AI Content Detection, ZeroGPT (not sure how that made it past the trademark office), Writefull GPT Detector, and Originality.ai all offer similar services with varying degrees of accuracy.

Why is my writing being flagged as AI?

Along with the rise of ChatGPT and the rise of AI detection tools, now both writers and readers have a new worry about how to tell if content is AI-created and whether genuine writing is being labelled as coming from an AI. This is particularly a problem for students, who could face consequences from their schools or universities if they are found to be using AI. Some students are now habitually running their own original work through detectors like GPTZero and finding that it is flagging up sentences as AI written even if they weren’t.

In 2024 a writer for The Atlantic, Ian Bogost, described running his own original work through plagiarism detection software and finding that initially, a staggering 74% of his work was flagged as plagiarized. With careful checking and elimination, he managed to get that number down to zero, but it took him several hours of review and settings adjustments to get there.

AI detection is similar to plagiarism detection, in that both can only reflect the best guesses as whether a piece of writing is original and human generated or not. And these tools require a lot of careful review, as both can tend to produce false positives. If you’re finding that your work is being flagged as AI-generated on GPTZero when it wasn’t, then this could be for reasons as broad as not being a native English speaker, being too repetitive with your ideas, or having used a tool like Grammarly.

If your work is being flagged as AI, double check that you have all your quotes and citations formatted properly, and try to avoid using automatic tools like Grammarly for making edits.

And remember, this is GPTZero’s black box, “trade secret” proprietary algorithm that’s claiming your writing statistically resembles other examples found across the entirety of the public internet. The company isn’t going to explain how its product actually works, or demonstrate it does so accurately, in a court of law. So if you do find yourself in jeopardy over alleged generative plagiarism, it’ll be your word against theirs. Lawyer up and make them prove their work.

Do we really need plagiarism checks?

Pushing far beyond the research lab that many text-generation AIs have been bound to, OpenAI released ChatGPT to the public in late November 2022. By January 2023, ChatGPT had over 100 million users, making it the fastest-growing public application yet.

That means any concerns about plagiarism are only going to increase as this AI assistance becomes available in all corners of life. Microsoft is incorporating OpenAI’s technology into Bing search, and Google is testing its own version, known as Gemini (formerly Bard).

A color painting of a laughing robot, generated by Dall-E. — Image used with permission by copyright holder

On a related note, AI image generators like Dall-E and Stable Diffusion are under scrutiny for potential copyright violations. All of these artificial intelligence services have been trained on the writing, photographs, and artwork found online that have been created by billions of humans.

In a way, AI is borrowing from human intelligence, not creating on its own. If I borrow from another human, I must give credit and possibly pay a licensing fee. With generative AI, it becomes more difficult to cite a source because each text or image is broken down into diffuse elements and then reassembled to create a new piece using thousands or millions of sources.

We either need to rethink how we feel about copyright and plagiarism or find tools that help identify AI-generated material and possibly develop a method of giving credit to the vast number of people that contribute to every AI-generated work.

Source link

#GPTZero #detect #ChatGPT #plagiarism

GPTZero: how to detect ChatGPT plagiarism