That scourge of all in boxes, the anonymous e-mail, might have met its match.
Researchers at Concordia University in Montreal have developed a technique to determine the origins of anonymous e-mails by analyzing the sender’s writing style. The result is a unique “write-print” that’s like a fingerprint for prose, identifying an author through a collection of writing characteristics. The method isn’t intended to reveal the spam authors who clutter your in box, but to aid law-enforcement officials in investigating crimes involving e-mail.
“Some people like to write their e-mails in all lowercase; some people like to use lots of commas,” says Benjamin C. M. Fung, an assistant professor in information-systems engineering at Concordia, who helped develop the approach. It will be published in a forthcoming issue of Information Sciences.
The Concordia researchers tested their technique on a trove of internal Enron e-mails, a byproduct of court cases against executives of the infamous energy company. The researchers achieved 80 percent to 90 percent reliability in determining the authorship of the more than 200,000 e-mails.
Currently, attempts to track down the authors of anonymous e-mails rely primarily on on an Internet-related identifier attached to a message, known as an IP address. It’s tied to a physical address where the server that first forwarded the message is located. But that physical address is often like a large apartment building. It could have multiple occupants who move around frequently, and so it can’t prove who actually wrote the e-mail.
This is no more reliable than using the self-reported address on a piece of mail, says David Brumley, an assistant professor in electrical and computer engineering at Carnegie Mellon University. He works with Carnegie Mellon’s CyLab, a cybersecurity research center.
Previously a computer security officer at Stanford University, Mr. Brumley says tracking down the authors of anonymous e-mails is extremely difficult.
“If the person is intelligent in what they do and plans it out, it usually leads to a dead end,” he says.
The method developed by the researchers at Concordia uses data-mining techniques to search e-mails automatically for the characteristic’s of an author’s writing style.
The technique can be used to whittle down the pool of potential suspects identified by an IP-address search, Mr. Fung says, by comparing anonymous e-mails with other examples of their writing. Their latest research determines the profile of the sender, characteristics such as his or her gender, race, age, and language proficiency, among others. He says this is particularly useful in helping investigators focus their search in earlier stages of an investigation. It could also help determine whether phishing messages, which pose as official communication and try to solicit sensitive user information, are the product of one author or several.
Mr. Brumley says the Concordia approach, by focusing on the composition of anonymous messages rather than their point of origin, could be a huge improvement over existing techniques.
While the Concordia researchers aren’t the first to develop the idea of a write-print, the advantage of their technique is that its results can be broken down into a series of writing-style characteristics that justify the conclusion, says Mourad Debbabi, a professor and director of the Concordia Institute for Information Systems Engineering who worked with Mr. Fung.
Other techniques aren’t as transparent about how they identify the author of anonymous e-mails, and that presents problems in court, he says. “In a court you need to not only make a claim but also explain how you reached that conclusion.”
Future research by the Concordia team hopes to identify the authors of anonymous chat-room and text messages, Mr. Fung says, although these mediums present greater challenges because the messages are shorter and, thus, present less material from which to discern writing characteristics.