I got interested in computing the readability of texts recently. I have been suffering from a readers bloc for the past three weeks or so. So, I was not able to read much on this subject, though I skimmed through quite a bit of material. Well, I did find some works on creating a language models to estimate text difficulty. But, majorly, most of the stuff I found were based on conventional readability measures, which must have had their golden jubilee too, in the past decade.
Then, I came across this decade old paper:
“Living off the land: The web as a source of practice texts for learners of less prevalent languages”
This was on finding out right texts from the web, to provide learning material for second language learners of Nordic Languages. I was fascinated by the official title of the project described in the paper: “Corpus based language technology for computer assisted learning of Nordic languages”.
So, what is this about?
To summarize briefly, here is what they do:
1. The user supplies example text, in the form of a URL.
2. The text this URL is evaluated for the readability and other language factors.
3. Along with these statistics, the user is also presented with ten possible query terms, from that document.
4. The user can choose what query terms can be sent to the search engine (Evreka)
5. Again, each of those result documents are evaluated for the statistics of (2), and the user is presented with the results, along with a brief summary about each document result.
Well, I don’t understand the motive behind asking the user to give a URL and then doing all this. However, that’s not why I am writing this post.
I was left wondering (sitting in the banks of Neckar, with pigeons playing in front of me … in big numbers) about the readability of Indian language texts. Did anyone attempt at experimenting with that? Will these traditional readability measures that are seen about English work well for Indian languages? Is it necessary to think about Computer Assisted Language Learning for, say, Telugu? Can Telugu be called an LPL (Less Previleged Language)?
Well, “What is readability?” “How can someone measure general readability, isn’t there a personalized angle there?” – are my perpetual questions, though. They are not specific to Indian Languages.
However, atleast if you let the imagination run high, there is a very interesting scope to use this practically, in the Language teaching domain (IMHO). Perhaps, I’d write better if I can get over my readers bloc 🙂