Open Source OCR: Russian OCR engine to be published as FOSS
OCR is one of the few markets that are not fully internationalized yet. An OCR that can decently process Cyrillic texts for now can only come from Russia. And there are no more than two at the moment: ABBYY FineReader and Cognitive Cuneiform.
Both trace their origins to the late Soviet-era government research projects that were commercialized in the nineties. However, Cuneiform started to lose its position in the consumer market by the end of the decade, then the application saw very little progress since 2000, and now it is generally unknown among end-users. Cognitive, who has by now shifted to systems integration market, has finally decided to open up Cuneiform, make it available as freeware immediately on a dedicated website and publish under an open source license in March, 2008.
What makes it interesting is that Cuneiform will be the second OCR system to be published as Open Source after years of development inactivity along with Tessaract published by HP in 2005. Thus, the market of Open Source OCR will quite unexpectedly become competitive.
The most probable idea behind the decisions of both Cognitive and HP is to put to work the unemployed resources so that they start producing at least minimal benefit. It looks like a simple ‘let’s see’ action, and no clear business model seems to be lying behind it.
But with the recent increase of interest of the Russian authorities in Free Software usage at middle schools, the demand for the liberated Cuneiform could become considerable. However, until the government’s plan to shift all schools to Free Software by 2009 is fulfilled at least partially, it is very difficult to say what this state-supported middle-school FOSS market will look like and what its rules will be. But if it comes to reality, Cognitive has all chances to be a player there by simply having used the available resources in a smart way at the right moment.
Roberto Galoppini 10:14 pm on January 29, 2008 Permalink
Ciao Egor,
I just search for OCR on ohloh, an open source network – that just went open source – aimed at providing visibility into FOSS development. I think you might sign up and become a contributor, promoting Cognitive as soon as it will be released as open source.
Emily 5:05 pm on January 31, 2008 Permalink
This is excellent news – I have no expertise in the Russian language and have been trying to do research on old propaganda posters in our library. Now I can try some digital translation tools on a few of the pamphelets I have around. Thanks so much for posting this!
Egor Grebnev 6:02 pm on January 31, 2008 Permalink
Emily,
Glad to know it was helpful for you!
Max 11:58 am on July 1, 2008 Permalink
Very useful information for me. Thank you.
kfke 7:00 pm on July 30, 2008 Permalink
please send me OCR
alex 8:39 pm on August 21, 2008 Permalink
I’m translating a book from Russian to English I want build a tools to do this for me. After having scanned all pages I will run this tool and watch it work it’s magic. This is great info. Thank you.