Thursday, 20 November 2008 — 10:57am | Computing

So I just punched my blog into Typealyzer, a text-analysis tool that classifies its input in accordance with the 16 psychometric pigeonholes of the Myers-Briggs Type Indicator. Holy crap is it good:

The analysis indicates that the author of is of the type:

INTJ – The Scientists

The long-range thinking and individualistic type. They are especially good at looking at almost anything and figuring out a way of improving it – often with a highly creative and imaginative touch. They are intellectually curious and daring, but might be pshysically hesitant to try new things. [Is that psychically or physically, or both? — ed.]

The Scientists enjoy theoretical work that allows them to use their strong minds and bold creativity. Since they tend to be so abstract and theoretical in their communication they often have a problem communcating their visions to other people and need to learn patience and use conrete examples. Since they are extremly good at concentrating they often have no trouble working alone.

Typealyzer is still in beta, but you can try it out on your favourite online publications. Keep in mind that it is an analysis of the text, not the person—or to look at it another way, an inferred reading of the writer’s persona through the text. For bonus points, try it on group blogs.

I wish the wacky minds behind it would disclose more about their algorithms beyond the layman’s summary in the FAQ. I’d like to know how much tuning they did, if any, to categorize the texts that they used as a statistical corpus. I’m also curious about what specific factors, if any, they tried to model and weight; syntactic structure is a no-brainer, but I wonder if they considered the use of personal pronouns, sentence/word lengths (akin to the Flesch-Kincaid formula but deployed to different ends), abstractness of vocabulary, or any number of other factors that are often revealing of style—especially when you consider the metrics displayed on the brain-activity map the program outputs along with the Myers-Briggs type.

I think it would be conceptually impossible to produce an absolute, generative parametrization of style (for reasons I won’t get into here), but a statistical analysis like this one—and the strength of its correlation to our intuitive estimations—could go a long way towards a better formal understanding of that elusive quality we call “writer’s voice”. My computational linguistics nerves are all atingle.


