Interesting Reads

Microsoft is teaching AI to speak Dholuo and Kikuyu, Starting with Kenyan Farmers

Project Gecko aims to make generative AI useful for the "global majority" by ditching English-first training data and focusing on local languages like Kikuyu and Dholuo.

Join Techish WhatsApp

Generative AI has a language problem. While models like GPT-4 are proficient in English, their performance drops off a cliff when tasked with low-resource languages or cultural nuances specific to the “global majority” – the projected 85 percent of the world’s population that lives outside the West.

The issue isn’t just translation; it’s that the underlying training data simply doesn’t exist for many communities. Microsoft is attempting to solve this infrastructure gap with a new initiative called Project Gecko. Led by Microsoft Research Africa in Nairobi, alongside teams in India and the US, the project is building Small Language Models (SLMs) designed to run on low-cost devices, specifically tailored for local needs.

The project is starting with agriculture in Kenya and India, debuting a new multimodal AI agent that can watch videos, listen to vernacular speech, and answer farmers’ questions with verified accuracy.

The “Hallucination” Problem in Farming

The core problem Project Gecko aims to solve is the disconnect between generic AI models and on-the-ground reality. If a smallholder farmer in Nyeri asks a standard LLM about crop diseases in Kikuyu, the model might hallucinate an answer or provide advice relevant to US industrial farming, not local soil contexts.

To fix this, Microsoft has partnered with Digital Green, a global development organization that maintains a library of over 10,000 agricultural videos in more than 40 languages. Previously, this data was siloed and hard to search.

Project Gecko processes this data using a new system called the MultiModal Critical Thinking Agent (MMCTAgent).

Unlike a standard chatbot that just predicts the next word in a sentence, the MMCTAgent analyzes speech, images, and video content simultaneously. When a farmer asks a question, the agent doesn’t just generate text; it reasons across the available media and retrieves the specific video timestamp where the solution is demonstrated.

For example, a farmer can ask a verbal question in Kikuyu. The system processes the audio, finds the relevant visual answer in Digital Green’s database, and responds with text, audio, and a video clip.

Microsoft says field studies in Kenya and India showed this method significantly improved user trust compared to generic AI, largely because the AI “shows its work” by grounding answers in real video content rather than abstract training data.

Small Models and Local Data

The technical backbone of Project Gecko relies on Small Language Models (SLMs) rather than massive, cloud-heavy LLMs.

There are two reasons for this. First, internet bandwidth and device capabilities in rural Kenya are limited; SLMs require less computing power. Second, existing large datasets for languages like Kalenjin or Maa are virtually non-existent, making it impossible to train massive models effectively.

The Project Gecko team had to build the tools for Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) from scratch. They compiled a dataset of 3,000 hours of crowd-sourced Kenyan speech to train and fine-tune these models.

Currently, the system supports Swahili, Kalenjin, Dholuo, Maa, Kikuyu, and Somali.

“Agriculture has very specific terms, which may change from language to language, and even district to district,” says Tanuja Ganu, Director of Research Engineering at Microsoft India. “All those domain-specific nuances need to be understood.”

Open Source and Future Plans

Microsoft is positioning this as a foundation for broader application, not just a proprietary tool. The MMCTAgent is available now on Azure AI Foundry Labs, and the code has been open-sourced on GitHub.

The company is also creating a public leaderboard to benchmark how well AI models perform in African languages, attempting to create a standard for accuracy that currently doesn’t exist.

While agriculture is the test bed due to its economic importance in Kenya – where it contributes significantly to GDP – Microsoft plans to replicate this “design pattern” for healthcare and education. The idea is to create a playbook for developers to build domain-specific AI tools that don’t rely on Silicon Valley-centric data.

The team is currently using insights from 130 farmers to refine features like clarifying questions and peer-to-peer knowledge sharing.

“Building AI systems from the ground up shaped by the knowledge, languages, and modalities of the global majority yields more innovative, useful solutions for a great number of people,” says Ashley Llorens, VP and Managing Director of the Microsoft Research Accelerator.

For now, the focus remains on proving that a farmer in Nyeri can get an answer as accurate as a software engineer in Seattle.

Join Telegram!

Dickson Otieno

I love reading emails when bored. I am joking. But do send them to editor@tech-ish.com.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to top button