Sunday, January 21, 2018

I can't steal (text) from myself, can I??? (hint: your text might not be yours anymore)

So, a journal policy issue came up for me this week. I've anonymized the details, so I can discuss the big-picture conceptual issues.

We had an author of a paper who recycled big chunks of the methods from his/her past paper(s). Same data (analyzed in new ways), so same methods, so why rewrite the whole thing. As a result, more than 1/4 of the whole paper was recycled text. We pointed this out.

The author responded that big long-term data studies are often the subject of multiple publications, that all rely on the same data set and hence the same methods to collect the data.  This is actually quite a common issue in biology, and not at all unique to datasets like the one in this maanuscript. Moreover, even new datasets can be obtained through the same methods as previous papers, such as when I use the same molecular genetics protocol to obtain DNA sequence data, albeit for different biological specimens. So, this is indeed a recurrent subject of angst and confusion.

So, why the policy barring extended re-use of one’s own text?

Cyanide and Happiness Self Plagiarism
(cartoon by Cyanide and Happiness)

Before answering that, I want to point out that very modest re-use, a sentence here or there, or sentence fragments, is common and does not typically raise any red flags (as long as the text is one’s own). What we are concerned with here is extensive re-use of very large blocks of text. Many journals have software that automatically scans for re-use of text available on the web. iThenticate scores are often 5% to 10% repeated text, and this is often driven by citations in the paper, which are by necessity repeated.

The fundamental reason for this policy has to do with copyright law.  If you hold the copyright to the previously published text, or if it is not copyrighted, then there is no official legal problem. For instance, your grant proposal contains lovely introductory and methods text, and is not copyrighted. You may re-use chunks of it for a submission.

But, authors rarely retain unfettered copyright to their published work. Chances are, that previously published paper is copyrighted and the copyright probably belongs to the publisher. So, even though you authored a previous methods section, you need to get permission from the publisher of the previous paper (or whoever holds its copyrights), to re-use large chunks of text.  So, this isn’t our journal’s arbitrary editorial policy that we can change at a whim. It is our obligation to follow copyright law.  There is a caveat here: there is the option for “fair-use” in citation, but this usually applies to small stretches of text, not extensive quotation. And, to qualify as fair-use under the law the quoted text must be clearly denoted as a quotation (e.g., surrounded by quotation marks), with attribution to the original soure.

Given that we do not have legal flexibility here, what is an author to do? 

First, one can subtly rephrase the original text. This is what most of us do when we submit repetitive methods sections; we reorganize sentences, substitute synonyms when possible, and generally make the text a non-exact copy. That’s usually not too hard to do.  

Second, one can actually place the methods section in question in quotes, with attribution to your previous work. That’s atypical in our field (I’ve never ever seen it done). But, it is legal though it will raise eyebrows. 

Third, you can shorten your methods and refer readers to the methods section of the prior paper. This is commonly done, though reviewers (and readers) sometimes get annoyed that not everything is written down in one place. But really, we all implicitly build our methods on the backs of past papers (I don’t, for example, re-derive a proof for an ANOVA or regression, each time I use statistics), so it’s not unreasonable to do this third option.

For more on this topic, there are a wide variety of online resources commenting on this issue. You are clearly not the first to ponder this subject, given the extensive list of readings available on “self-plagiarism”. Reading these sources reiterates for me that our journal’s policy is legally necessary and typical within the natural sciences. Here are a few sources:

There is a counter-argument here, of course. Quite a few people on twitter and elsewhere have responded that they feel this rule and tradition is misled.  Some just seem to feel it is unjust that authors don't have the right to reuse their own text whenever they want. Some expressed frustration at the waste of time required to rephrase text to say the same thing. The author of the manuscript in question was dissatisfied by my narrow explanation of the rule. My saying "its the law" does not explain the rationale behind this. Laws can be changed. But this is broader than just the law; it is also an academic norm. People trying to push this boundary will meet with more resistance from their academic peers, than from lawyers.
I've been pleased to see that this post has generated some serious conversations on twitter (and perhaps elsewhere). Some quite critical of my post. I want to respond by making a few things clear:

1) I am not a lawyer. My statements above about legality are my understanding from discussions with people in journal publishing (also not lawyers, but better informed than I). I also based my comments on the various links above.

2) I don't actually seriously endorse all of the solutions listed above. For instance, the block quote of methods, was meant in jest (I ignored the number one rule in online writing: irony does not translate well to text). For instance, Lenny Teytelman took me to task ( for suggesting this option. Lenny also disliked my statement that you can cite a prior methods section of another paper. I totally agree with Lenny that this can lead to chains of poorly detailed methods that undermine our goal of reproducibility. I noted that this chain of citing other methods is common (it is), and annoying (it is). He correctly points out that it can also lead to ambiguity. To be clear, I was listing these two options for sake of completeness, not endorsement.

3) Lenny also disliked my more serious comment: rephrase. He said this undermines precision. I disagree. This is done all the time, slightly modifying bits of prose in ways that is not "artificially changing method details", just changing the words and sentence structures used to describe them in ways that are truly synonymous. If there's really only one way to say something, then by all means say it the same way. I just doubt that's often true.


  1. All true, but still an annoying waste of time. Option one is what I generally end up doing, since option 2, as you say, raises eyebrows, and option 3 is frowned upon by readers and reviewers (I have been told not to do it, when I attempted to :->). So, we end up re-wording large chunks of our own text for no really good reason except that plagiarism-detection filters tell us to, and we're afraid of getting sued. Seems like a system that ought to be redesigned – but obviously it is beyond any one journal's power to make that happen. It's worth noting that the slight, cosmetic rewording typically done to evade this issue as part of option one is probably, in fact, insufficient to actually insulate against charges of copyright infringement, and more than a slight, cosmetic rewording here and there in the text of, say, a Michelle Obama speech would be sufficient to insulate, say, Melania Trump from charges of plagiarism. The present situation is tenable only because there is a sort of détente among copyright holders in academia; but I would guess that copyright infringement lawsuits could be waged and won over a great many papers being published nowadays. If we can negotiate that détente, why not negotiate a détente that goes one step further and allows reasonable verbatim quoting of materials and methods?

  2. I am going to try placing the entire methods section in quotes just to see what happens.

  3. Another angle here is that if your earlier paper was multiply authored, even if authors retained copyright, which ones now hold it? If the copyright is jointly held (as in "copyright The Authors"), then presumably you'd need licenses from all the coauthors. Which is clumsy at best! And yes, we'd all prefer not to have to worry about little things like legality, but it's not on journals to overturn centuries of laws.

    I'd also wonder how common it really is that re-using older Methods text is optimal. If the methods are so totally standard that readers of every paper need the exact same information about them, then they are approaching the ANOVA in Dan's example. If they are really novel, then readers of each paper may need different aspects/detailed emphasized. So how big is the middle ground where readers really need the methods, but they don't need any different perspective or detail each time? Sure, it's easier for the writer to just repeat. But what matters is what's easier for the *reader*!

  4. As an Editor I have had to argue with the publishers and authors about this on many occasions :-(


