Speaking of the future: make sure no voice is left behind

The following article is authored by Robert Kirkpatrick.

As many of you may know, analysis of social media content has been a mainstay of our work at UN Global Pulse from early on. Over the past decade, social media has become a genuine force for empowerment, a powerful tool in the hands of its users for ensuring accountability. Just as social analytics in private sector have spawned new business models and transformed customer engagement, public sector institutions now have the means – and the obligation – to make what once would have remained invisible until it was too late – rising food prices, a local disaster, or symptoms of an unknown disease – visible and actionable.

For those on the wrong side of the digital divide, however – the nearly half of humanity that still has no access to the Internet – much of this does not apply. Not only does lack of access to information and digital services impede a community’s development, but it also means that the community is generating relatively little data that could be used to target and monitor the impact of development programmes.

We had to grapple with this paradox when we launched Pulse Lab Kampala in 2013. Uganda is a Least Developed Country, and as with many other countries in the region, Internet access is scarce. There is, however, a social media platform in rural areas of Uganda, as in most other developing countries, that is used widely for advocacy, public awareness, and education: talk radio. Across Uganda’s more than 250 FM radio stations, talk radio shows and interactive news programmes provide a vibrant platform for public engagement.

Could we somehow apply what we had learned from analysis of online social media to “take the pulse” of communities still largely offline?

The content of those shows is analogous to what you would find online, and it is similarly public. Back in 2013, we began experimenting with a branch of NLP (Natural Language Processing) called ASR (Automated Speech Recognition, the AI-based technology that lets us interact with Alexa, Google, and Siri). What if we could automatically recognize and transcribe what was being said during those rural talk shows, and perhaps (as a stretch goal) even generate a rough translation into another language? The potential for everything from monitoring the impact of development programmes to rapid response to crises was huge.

The challenge with this idea – which we called “radio mining” – was that the languages spoken by communities on the wrong side of the digital divide, particularly in poor, rural communities, are indigenous languages that today aren’t computable. Linguists refer to these as “low resource languages.”

So, we set out to create our own training data, language models, and speech recognition from scratch, partnering with Stellenbosch University, and building our own speech engineering team. While the effort was significant, it worked: we created text and speech processing for two indigenous Ugandan languages, Luganda and Acholi – and in the years that followed, our radio mining activities expanded to other countries and languages. For those who are interested, you can read more here and here.

But...that’s not actually what this post is about.

Additionally, what we realized along the way was the far greater opportunity to use these technologies to achieve the SDGs. Forget radio mining: similar technology in a tablet could, for example, help teach a child in a rural village with no access to school to read and write in her own language.

The tablet’s AI, though, has to know the language, and there’s the rub: of the roughly 7,000 living languages, about 1,000 have at least 100,000 speakers. Yet only for 100 or so is there speech recognition of sufficient quality to allow interactive learning. Unfortunately, whether your language becomes computable is usually determined by a handful of trillion-dollar companies, based largely on whether they view speakers of your language as a lucrative market for targeted advertising. And the computability of the language(s) you speak has direct implications for whether and how you will be able to share in the smart, AI-powered digital future we keep hearing about.

Literally billions of people are paying a daily opportunity cost in unrealized SDG progress by not having access to this technology and the services it affords. This is also an issue of human rights, and while it is understandable that global efforts to close the digital divide and create true digital public goods have initially focused on fundamental issues such as Internet access and digital literacy, the pivotal role of NLP in digital inclusion hasn’t been properly recognized.

While these issues fall squarely into the domain of AI ethics, there is another reason I wanted to raise it as part of the UNESCO Inclusive Policy Lab. The computability of one’s mother tongue has implications not only for the future, but also for the deep past, embodied and ever-present in our richly diverse cultures and identities. Alongside the catastrophic impacts of climate change on biodiversity, there is another mass extinction underway: by the end of this century, more than half of those aforementioned 7,000 languages are expected to become extinct. Some put the proportion as high as 90%. The cultural loss represented by the extinction of even a single language is incalculable; as linguist Ken Hale put it, “When you lose a language, you lose a culture, intellectual wealth, a work of art. It’s like dropping a bomb on a museum, the Louvre.” And digitalization is playing a significant role in this ongoing catastrophe: of those 7,000 languages, a mere 10 have already come to represent more than 80% of the linguistic content online, with English and Chinese accounting for over half of that.

So let’s go back to the idea of a child who is growing up in a remote village, speaking only an indigenous language, with no access to school. Let’s imagine she has just been handed a tablet loaded with the best interactive educational apps available. If – and only if – that tablet knows her mother tongue, vs, say, some regional lingua franca, she will become literate in that language, and go on to learn other subjects. She will also chat in that language with friends, play (or code!) games in it, and one day interact through NLP with myriad digital services that not only advance progress on the SDGs but also bring her language with her into a connected future in which they both can thrive.

So how do we make sure this happens? From one side, it’s just about fairness: as a speaker of a computable language, I know that if my toddler gets something stuck in his throat, one of ten devices in the house will tell me how to perform the Heimlich manoeuvre fast enough to save his life. But it’s also a remarkable impact investment opportunity: near term, the only way to give these languages a future is by giving their young speakers a future, and vice-versa, and down the road, as digital inclusion benefits their communities, those NLP capabilities will have real business value.

We think it’s time to take on this challenge. Mozilla Labs is already cranking out some incredible open source tools, and Lacuna Fund is supporting the creation of training data for 29 languages. But time is not on our side, and so much is at stake. We need something along the lines of a global “Manhattan Project” to secure a future for thousands of languages and millions of kids. If you have ideas on how the UN could help or are interested in working with us on this, do reach out.

If we want to leave no one behind, we must leave no voice behind.

….

Robert Kirkpatrick is Executive Director of UN Global Pulse, an innovation initiative of the United Nations Secretary-General driving a big data revolution for global development and resilience. He has over 15 years of experience in developing technology solutions for public and private sector organizations, with a focus on organizational change.

The author is responsible for the facts contained in the article and the opinions expressed therein, which are not necessarily those of UNESCO and do not commit the Organization.

UNESCO Inclusive Policy Lab

Quick links

Speaking of the future: make sure no voice is left behind

Join

Robert Kirkpatrick

UNESCO Inclusive Policy Lab

Quick links

Join

Quick links

User login

Speaking of the future: make sure no voice is left behind

Join

Robert Kirkpatrick

UNESCO Inclusive Policy Lab

Quick links

Join