Home

Jump to bottom

AswathyVinod edited this page Nov 5, 2024 · 12 revisions

Anudesh

An open source platform to annotate and label LLM data at scale

Anudesh is an open source platform to annotate Large Language Models' (LLM) data at scale, built with a vision to enhance digital presence of under-represented languages in India. It assists in collecting prompt data that will help create a benchmark for evaluating how well large language models like ChatGPT perform in Indian contexts.

Why Anudesh?

As large language models (LLMs) like ChatGPT and others continue to evolve, there is an increasing need to evaluate how effectively they understand and generate responses in diverse linguistic and cultural contexts. While LLMs have shown remarkable progress in handling mainstream languages like English, they often struggle when it comes to less globally dominant languages and specific cultural nuances.

Anudesh is an essential tool in bridging this gap. Developed specifically for Indian contexts, it addresses the need to evaluate how well LLMs perform when faced with the rich linguistic diversity of India. India is home to hundreds of languages and dialects, each carrying unique cultural and contextual subtleties. Tools like Anudesh help ensure that LLMs can accurately understand and respond to prompts in these diverse settings.

The key reasons for Anudesh's development are:

Representation of Indian Languages: Indian languages are underrepresented in existing benchmarks for LLMs. Anudesh helps collect prompt data that includes a wide variety of Indian languages and dialects, ensuring that these models are tested comprehensively in the Indian context.
Cultural Context Sensitivity: Language is not just about words; it is deeply intertwined with culture. Anudesh allows us to evaluate whether LLMs can handle Indian-specific cultural references, idioms, and local knowledge.
Improved Accuracy and Inclusivity: By benchmarking LLM performance with prompts designed for the Indian context, Anudesh pushes for more accurate and inclusive models that are better suited for users across different regions of India.
Setting Standards: Anudesh facilitates the creation of benchmarks that can serve as industry standards for evaluating LLMs in non-Western contexts, ensuring that these models are truly global in their application.

In short, Anudesh plays a critical role in refining and improving LLMs by holding them accountable to the linguistic and cultural realities of India. This will lead to more accurate, responsive, and relevant AI interactions for millions of Indian users.

Goals

Create a comprehensive benchmark for evaluating large language models (LLMs) like ChatGPT in Indian linguistic and cultural contexts.
Gather diverse prompt data across multiple Indian languages, dialects, and cultural settings to reflect the unique diversity of India.
Evaluate the performance of LLMs in understanding, interpreting, and responding to India-specific queries, idioms, and cultural references.
Promote inclusivity and representation by ensuring that LLMs are better trained and tested in underrepresented Indian languages.
Enhance the accuracy and relevance of AI models when applied to Indian users by identifying gaps in current LLM capabilities.
Drive AI improvements by providing critical insights and data that help refine LLMs for more robust and context-aware performance in Indian environments.
Establish industry standards for evaluating LLMs in non-Western contexts, making Indian languages and cultural nuances a priority in AI development.

Features of Anudesh

Anudesh offers a range of features designed to facilitate the creation and evaluation of prompts for large language models (LLMs) in Indian contexts. Two core project types and additional functionalities make it a comprehensive tool for benchmark creation:

1. Instruction-Driven Chat Type

In this project type, annotators are provided with a specific instruction.
The annotators then submit a prompt that aligns with the given instruction.
This approach helps ensure that the prompt data collected is both varied and relevant to the types of queries and interactions users might have with LLMs in real-world scenarios.
An additional feature called ‘Hint’ is available to annotators. It provides examples and supplementary information to guide annotators in creating appropriate prompts based on the instruction.

2. Model Output Evaluation Type

In this project type, annotators evaluate the model’s response to a prompt.
Annotators are given both the prompt and the model’s generated response.
They assess the quality of the response by answering a set of questions that target various evaluation criteria, such as accuracy, relevance, and cultural sensitivity.
This ensures a structured and thorough evaluation of LLM outputs in Indian contexts.

Additional Features

Workplace Management: Anudesh uses a hierarchical system to organize work into organizations, workspaces, and projects, making it easy to manage large-scale prompt data collection and evaluation across different teams and tasks.

Transliteration Feature: Anudesh simplifies input with transliteration options using IndicXlit models, supporting 20+ Indian languages. Annotators can input text in Roman characters, and the system will transliterate it into the appropriate Indian script, making the tool user-friendly for multilingual work.

Reports and Analytics: Anudesh generates reports and analytics at various levels, including date-based summaries, helping organizations track progress and assess the quality and quantity of the work being done.