The Blogosphere: Lost in Translation?

freeandfun1

VIP Member
Feb 14, 2004
6,201
296
83
The Blogosphere: Lost in Translation?
It could be if translation technology can't keep pace with the instantaneous, spontaneous nature of communications on blogs and social networks.

BernardMoon [Found Next Gig, AlwaysOn] | POSTED: 02.23.05 @09:28

Last week I was sitting with AlwaysOn streaming guru Vassil Mladjov at a café talking about various business ideas when he mentioned online translation software. We briefly discussed the quality of translation—according to Mr. Mladjov, the Russian-to-English translation is good enough to get basic ideas across; this is not the case with Korean-to-English (which I found to be horrible)—and then moved on to other topics, translation software not being a major area of interest to me at the time.

Later that same week, however, I came across an article from Reuters that made me once again think about language and technology. It seems that the head of France's national library, Jean-Noel Jeanneney, has raised a "war cry" over Google's plans to put some of the world's great libraries on the Internet. His beef? He believes the move could lead to a "domination of American ideas" and the English language.

Although I was a bit amused by this French take on Google's plans—after all, how bad can it be to put the world's greatest libraries online?—it did make me wonder how concerned the rest of the world is about this issue. More importantly, it made me wonder how much of a barrier language will present to the instantaneous exchange of ideas in a future dominated by blogs and social networks—mediums whose very existence is predicated on the free and timely exchange of information without editorial constraints or barriers to entry.

I thought about my mother's favorite Christian philosopher, Jacque Ellul (a French philosopher whose books she forced on me with as much vigor as she did Allan Bloom's The Closing of the American Mind). If Ellul's books hadn't been translated into a language my mother could understand, she'd never have experienced the joy of being exposed to his ideas. I then extrapolated that situation to the present, and thought about Ellul sharing his ideas in real time on his very own blog: How could someone like my mother—who would be ecstatic at the idea of a Ellul blog—benefit from such spontaneous missives if they were being written in a language that was foreign to her?

That, in turn, led me to think about other topics and issues from around the globe that people not only want but need to read about from the perspectives of the people who are on the ground experiencing them. What are Iraqis saying about the situation in their country? What would the people of Rwanda have written to make us understand the horrors that took place there? And on a more positive note, what could we learn from Japanese bloggers about the latest gadgets from that country months or years before reading about them in the pages of Wired or on Engadget? Other than what people like Mohammed and Omar from the Iraq the Model blog—who are writing in English—have to say, we're missing out on these voices because we don't understand their language.

As I pondered these things, it became clear to me how important instantaneous translation will be in the rapidly growing blogosphere. Blogs and social networks are making an already small world even smaller by providing a growing universe of timely, relevant, and important information from people of all countries on just about any subject you can imagine. To be limited by language to just a particular subset of this information would be an unfortunate bind indeed.

But that seems to be precisely the situation we find ourselves in today. Have you ever clicked "Translate this page" when a foreign-language site comes up on Google? Try Korean for a news article or blog post, and your translation results will be 50% at best (and typically far worse)—in other words, woefully inadequate. The technology used to translate text within documents on the web is called Machine Translation (MT), and a couple of the organizations leading the charge are the French company Systran (whose MT technology is used by companies such as Ford, Cisco, and Google) and the Israeli company Babylon.

Research firms such as IDC (a subsidiary of International Data Group) position Systran as the clear leader in this space. And Systran—which has been developing the technology for more than 35 years and currently offers its product in 36 languages—does indeed offer the most comprehensive MT system. That said, however, the product is only ideal for basic translation and is unsuitable for strings of more than two sentences—in other words, not very useful for blogs. (Some customized projects—such as ones for Ford Motors—have been effective; however, such projects are not applicable or available to the general public.)

In describing MT technology (and its limitations), IDC's report "Overcoming the MT Quality Impasse" states, "MT systems work with natural language—a data set that is infinitely varying, ambiguous, and structurally complex. To translate adequately, an MT system must encode knowledge of hundreds of syntactic patterns, variations, and exceptions, as well as relationships among these patterns ... A human translator prioritizes and selectively applies linguistic rules based on this knowledge. MT software, unless explicitly coded for each possibility, cannot. Thus, MT will never attain the overall quality of human translation."

Leading the way in the development of MT products and technology are companies like Systran, IBM, Sail Labs, and LogoVista. Microsoft Research may also be dabbling in the area, but I don't believe it represents a significant area of investment for the organization. Looking at this line-up makes me wonder what priority the IT industry places on MT technology: Are enough government and corporate research dollars being invested in this space?

While some experts believe MT technology will be able to accurately translate full paragraphs and blog posts within several years, others believe that capability is still more than a decade away. We must do whatever we can to shorten that timeline—which means making it a primary initiative for the U.S. research community. The implications of simultaneous translation are enormous: Not only will it make governments, corporations, and cultures more transparent, it will make the world more transparent—and that's something that will benefit us all.

Bernard Moon is an entrepreneur responsible for a handful of startups related to video-on-demand, voice recognition, and computer software. He is also a freelance writer and blogger at his personal, politically leaning blog.
 

Forum List

Back
Top