ROBERT SIEGEL, host:
Perhaps like us, you assume that the language of tweeting, as in communicating by Twitter, whatever its shortcomings was, at least, reasonably uniformed no matter where its users are atwitter. Well, not so. Thanks to Jacob Eisenstein, we now know that while in Southern California something might be coo - that's cool without the L - a Northern Californian who's making the very same brief trenchant observation is calling it koo, spelled with a K. And if you're tweeting in New York, you're doing suttin, and that doesn't mean that you're on Sutton Place.
Jacob Eisenstein is a postdoctoral researcher at Carnegie Mellon University. He and his team analyzed more than 380,000 tweets and discovered that even in 140 characters or less, there are regional dialects.
Jacob Eisenstein, welcome to the program.
Mr. JACOB EISENSTEIN (Postdoctoral Fellow, Carnegie Mellon University): Thanks.
SIEGEL: And first, those New Yorkers who were doing suttin, what does that mean?
Mr. EISENSTEIN: It just means they were doing something. It's a nice example of a phenomenon that we see in a couple of different places in Twitter
You have a standard form, something, which is used throughout the U.S. You have more phoneticized forms that are spelled more how they might be pronounced, like sumthin, S-U-M-T-H-I-N. And then we have a very specific form to New York City, suttin, which is really almost never used outside of sort of the immediate area around New York City.
SIEGEL: What's another pretty good regionalism that you discovered?
Mr. EISENSTEIN: Well, one that we were expecting to find because we had some evidence from speech is a word called hella, which, you know, if you spent any time living in Northern California, people tend to associate with the Northern Californian spoken dialect.
Mr. EISENSTEIN: Hella, yeah, and it just means very: I'm hella tired. And indeed, that shows up on Twitter, too. So that was just some confirmation that we were finding things that did sort of derive from speech.
On the other hand, we found things that really seem unconnected to speech at all. The example you mentioned at the beginning, koo, which you could start with a C or with a K, it's really impossible to speak that difference, I think.
(Soundbite of laughter)
SIEGEL: You should hope.
Mr. EISENSTEIN: So this is something that I think is really unique to, maybe to social media or to written communication.
SIEGEL: Anything in here that truly surprised you about the differences that have developed so quickly on Twitter?
Mr. EISENSTEIN: Yeah, there seem to be regional differences that are almost completely unconnected from spoken language that are really unique to phenomena we find in written social media.
For example, there are a lot of abbreviations. Everybody, or a lot of people, know LOL, which is laugh out loud. And there are a lot of ways to say that things are funny on the Internet.
So LOL is a form that you'll see throughout the U.S. There are other forms that have the same meaning, that are much more regionally distinct. And unfortunately, most of these forms are things that you can't say on the radio, but again just sort of acronyms for things that are funny that you'll find, for example, just in Pennsylvania. There's another one that you'll find sort of centered on Washington, D.C., and just in the Mid Atlantic area, but again things that would really never find their way into a spoken conversation.
SIEGEL: As you are applying computational research to linguistics, I mean, do you find that something different has happened here, that first email, then texting, then Twitter, with all of its improvised shorthand and creative misspellings, is in fact making written language more like spoken language?
Mr. EISENSTEIN: Yeah, I think that's a great observation, and that's, to me, exactly what's so exciting about studying Twitter. You know, these sorts of regional differences are things that we've known about for a long time in spoken language, and in fact, you know, it's the case that spoken English between different cities in the U.S., linguists believe it's more different now than it was 100 years ago but in written language.
Up until very recently, the only data that you would have available would be very formal written texts, things by journalists or laws, things like that. So there was really no way to identify that kind of regional variation until now.
And now, through social media, we have written communication that's being used in a very conversational, informal way, and so we're starting to see all the same richness and diversity that we see in spoken language, we're starting to see that in written language, too.
SIEGEL: Well, Jacob Eisenstein, thank you very much for talking with us about regional dialects in Twitter.
Mr. EISENSTEIN: Thank you.
SIEGEL: Jacob Eisenstein is a postdoctoral researcher at Carnegie Mellon University.
(Soundbite of song, "Rockin' Robin")
Unidentified Man: (Singing) ...love to hear the robin go tweet tweet tweet. Rockin' Robin. Tweet, tweet, tweet. Rockin' Robin. Tweet tweetly deet.
MELISSA BLOCK, host:
You are listening to ALL THINGS CONSIDERED. Transcript provided by NPR, Copyright National Public Radio.