2 pepper rating

For the last dozen of so years we’ve heard ourselves incessantly reminding everyone that the “www” in most URLs means “worldwide web,” while the “e” in “e-commerce” all too often stands for English. Our research on e-GDP (online GDP) and the Availability Quotient demonstrated that many companies still have a long journey before they can meet the demands of the world’s markets for local-language content. That gap is no more apparent than in Asia where the amount of in-language content is dwarfed by the growing online population.

Just how dwarfed? Today, roughly 38% of internet users live in Asia, but by 2012, that number will jump to half. However, local-language content hasn’t kept pace. In 2007, non-Asian languages accounted for roughly 86% of the content on the web. Most of the remaining 14% was split among Japanese (6%), Chinese, (6%), and Korean (1.5%). All other Asian languages comprise less than 0.03% of the web’s content; for example, Southeast Asian languages make up less than 10 million pages. Given consumer preference for content in their own language, that huge gap between Asian content and total online population represents a huge opportunity.

That opportunity has not gone unnoticed. After getting an eyes-only, tell-no-one pre-briefing in December, we recently spoke with Asia Online CEO Dion Wiggins who called us to tell us that his portal had just scored its first round of funding from JAIC, the Japanese venture capital behind Alibaba.com, among others. He also wanted to let us know that Kirti Vashee, formerly VP of marketing at Language Weaver, had signed on as Asia Online’s VP of sales for the Americas and Europe with the responsibility for selling the commercial version of its MT engine.

Asia Online’s plans revolve around a proprietary machine translation engine plus a strong support infrastructure of humans, content, and partners are key to this strategy:

  • New technology. Asia Online developed high-performance statistical machine translation (SMT) software in collaboration with University of Edinburgh professor Philipp Koehn.
  • Clean corpora. Asia Online contracts with publishers, language service providers, and eventually corporations for human-translated content to train its SMT engine. The company also crowdsources the quality via a large community of students, and feeds the validated content back into the system as training data.
  • Matrixed language learning. The SMT engine can take translations of a novel into English, Japanese, and Thai and use the permutation to train itself on English<>Thai, English<>Japanese, and Japanese<>Thai. This capability is especially important for languages that don’t have enough content to feed a data-hungry statistical MT engine.
  • Real-time fixes. Its MT engine lets reviewers observe translation decisions as they are being made, allowing them to influence choices, make fixes in place, and propagate these modifications to wherever that phrase or term is used

Asia Online is talking with LSPs interested in using its SMT engine and has fielded corporate requests to use its software. We think that its real value lies in its Google-esque plan to drive billions of eyeballs seeking content in their own languages — and the advertising, special offers, and the next-generation linguistic tools that are sure to follow.

Share or tag this post on:
del.icio.us Digg Furl Reddit Ask Google Ma.gnolia Technorati Windows Live Yahoo!