Найти тему
Bussines psyhology

Text attribution: theory and practice

https://cdn.pixabay.com/photo/2015/11/19/21/11/knowledge-1052014__340.jpg
https://cdn.pixabay.com/photo/2015/11/19/21/11/knowledge-1052014__340.jpg

Is it possible to determine the authorship of a literary text? Yes, and above all with the help of archival sources - historical documents. If it is necessary to find out to whom this or that article belongs, which can be under a pseudonym or not signed in principle, researchers look for the payroll in archives. Whichever name a person has given under the article, he or she has received money under his or her own name for it. So we can understand whether it was Pisarev or Chernyshevsky, Dostoevsky or Goncharov. Other documents may also participate in attribution. For example, at the entrance to St. Petersburg, lists of visitors are usually drawn up, and we know that such a work could not be published by such a person if it was not in the city. Any document is more important than other ways of attribution, and working with such sources can be a real investigation.

But, for example, Shakespeare's identity raises a lot of questions: except for the will, no documents written by him have reached us. There is a view that Shakespeare's works do not belong to Shakespeare of Stradford, and someone else - a lot of candidates. Or, let's say, it's not clear whether Homer, whoever he is, composed the works that are attributed to him - these are not only famous poems, but also the so-called "Homer's hymns": he apparently lived in those times when there was no writing in Greece at all.

There are also situations when there are sources, but they are so contradictory that they can be interpreted depending on their own view of the subject. For supporters of the version that Sholokhov really wrote "Quiet Don", the surviving manuscript of the work confirms his authorship, and opponents of such a version believe that it is a text written by his hand, but composed by another person.

In cases where there is no convincing evidence in the form of documents, people resort to quantitative methods of analysis. This is a field of computer linguistics, which involves automatic text processing to identify certain patterns in it. Quantitative attribution implies that the author somehow manifests himself in the work, leaving a kind of "fingerprint". It is assumed that there is some "author's signal", which does not depend on what mood the author was in or on what theme he wrote. This can be called an author's style. It means that this "fingerprint" can be established by some objective means of control.

Appearance of quantitative attribution

The idea of quantitative attribution came a long time ago. In the second half of the XIX century there was a new approach to the definition of authorship of paintings. It was formulated by Giovanni Morelli, who called for a review of the approach to attribution as a whole. He argued that it is necessary to pay attention to details, such as how the ears or fingers are drawn. Most likely, the artist will not think about how to draw his ear, because he is used to drawing it in some special way. In the 20th century, Carlo Ginzburg returned to Morelli's idea and called it "an evidence paradigm".

The evidence in the text is something that the author does not seem to control, which he does not think about when he writes. Someone believes that it is possible to attribute texts on the basis of the same words. A typical example is the controversial text in the history of Russian literature of the XVIII century, which is called "An excerpt from the journey B*** I*** T***". Some literary critics in the 60s of the XX century claimed that if there is a word "travel", it means that the author is Radishchev, because he is the author of "Journey from St. Petersburg to Moscow". Of course, this hypothesis is not very convincing, if only because we pay attention to such a significant word, we use it consciously, especially in the title. The author's trail lies in something that will not be closely watched by the reader, the author, the editor or anyone else.

Thomas Mendenhall, at about the same time as G. Morelli, suggested that such an uncontrollable parameter could be the length of the word. He counted the length of words in Shakespeare's texts and came to the conclusion that Francis Bacon, in whose writings the length of words was approximately the same, was Shakespeare. But, firstly, Mendenhall made a mistake in the calculations, and secondly, the length of the words themselves is not the parameter by which to determine authorship. Later words, lexemes, distributions of case forms and so on were taken for such a parameter. However, all this did not give convincing results for all the results.

Delta method

The fracture happened in 2002. John Burrows wrote an article entitled "Delta: a measure of the stylistic difference", in which he formulated an approach to the task of quantifying authorship. His method was called "Delta".

For each word we consider z-score. It is calculated as the ratio of the difference in the word frequency in the text in percent and the total frequency on the body to the standard deviation of the word frequency on the body. Then we take the average for all differences between z-score and two compared texts. This is the delta.