About Deccan AI

Deccan AI is a high-growth, venture-backed AI model training and evaluation company headquartered in the Bay Area. Founded by alumni of IIT Bombay, IIM Ahmedabad, and ex-Google, we partner with the world’s top AI frontier labs including Google DeepMind, Snowflake, and several cutting-edge research groups. We are backed by Prosus Ventures, and our India office is based in Hyderabad.

We’re not just participating in the AI race we’re building the infrastructure that powers it.

With 1M+ global experts, advanced automation, and vertically integrated platforms, we deliver the gold-standard data that world-class AI models rely on. The AI data annotation market is exploding set to quadruple by 2032. The opportunity? Massive, and you can help define the future.

Job Description:

1. Background

The ML Researcher plays a critical role in keeping the organization at the forefront of AI innovation through deep, end-to-end research. This role is focused on identifying emerging research directions, designing novel AI benchmarks, and creating high-quality evaluation datasets aligned with both cutting-edge research and organizational priorities.

The ML Researcher bridges the gap between theoretical research and practical application by translating insights from the latest AI papers into actionable benchmarks and evaluation frameworks. The role emphasizes deep research, synthesis of academic and industry findings, and the design of solutions that meaningfully advance AI systems—particularly in agent frameworks, machine learning models, and emerging AI paradigms.

2. Purpose of the Role

The primary objective of the ML Researcher is to conduct comprehensive AI research and design novel, scalable, and real-world–relevant benchmarks and evaluation datasets. The role aims to push beyond existing evaluation practices by defining new criteria that better measure the capabilities and limitations of modern AI systems.

3. Key Responsibilities

1. Research & Literature Review

Continuously track the latest AI research, conferences, and publications.

Conduct in-depth literature reviews across emerging AI models, agent frameworks, and evaluation methodologies.

Identify gaps, limitations, and opportunities in existing benchmarks.

Assess current industry benchmarks for relevance to organizational and client goals.

2. Benchmark & Evaluation Design

Propose novel AI benchmarks informed by research insights and industry gaps.

Design evaluation datasets that assess AI systems across both coding and non-coding domains.

Ensure benchmarks and datasets are innovative, scalable, and practically applicable.

Define meaningful evaluation metrics that provide actionable insights into model performance.

3. Documentation & Research Deliverables

Create high-level requirement documents for each benchmark or dataset, covering:

Problem statement and motivation

Design overview and structure

Evaluation metrics and success criteria

Testing and validation guidelines

Ensure documentation is clear, comprehensive, and implementation-ready.

4. Cross-Functional Collaboration

Work closely with the ML Lead, pipeline-focused MLEs, and project managers to align research outputs with organizational priorities.

Collaborate with internal stakeholders to ensure benchmarks meet project goals and industry standards.

Provide feedback and iterative improvements based on implementation outcomes.

5. Continuous Innovation & Iteration

Refine and evolve benchmarks and datasets based on feedback, new research, and emerging use cases.

Propose enhancements to existing evaluation methods, including new metrics or benchmark variations.

Stay actively engaged with the AI research community through conferences, discussions, and ongoing learning.

4. Deliverables

The ML Researcher is expected to produce high-impact, actionable research outputs, including:

1. Benchmark Proposal Documents

Each proposal should include:

Clear problem definition and motivation

Detailed benchmark design and structure

Defined evaluation metrics

Testing and validation guidelines

Explanation of novelty compared to existing benchmarks

2. Evaluation Dataset Designs

Dataset overview, structure, and intended use

(Optional) Data collection, labeling, and cleaning methodology

Evaluation methodology and expected outcomes

Unique or differentiating features of the dataset

3. Research Reports & Whitepapers (Optional)

Periodic summaries of research findings or emerging trends

Internal documentation of best practices and benchmarking insights

4. Feedback & Iteration Reports

Post-implementation assessments of benchmark effectiveness

Recommendations for improvements based on team and client feedback

Iterative updates aligned with new research and usage insights

5. Timeline & Milestones

Weeks 1–2:

Complete initial research and first benchmark + dataset proposal

Present draft documentation for review

Weeks 3–4:

Deliver at least one new benchmark and one evaluation dataset per month

Submit finalized documentation for internal review

Weeks 4–6:

Incorporate feedback and finalize benchmark and dataset proposals

Complete testing and validation guidelines

Ongoing:

Regularly iterate on existing benchmarks

Provide monthly research summaries on emerging AI trends impacting evaluation

6. Expected Output & Impact

Innovative Benchmarks & Datasets: Continuous delivery of novel, high-quality evaluation frameworks.

Strong Documentation: Clear, actionable requirements enabling efficient implementation.

Industry Impact: Contributions that elevate evaluation standards and strengthen the organization’s research credibility.

Client Enablement: Research-driven inputs that support effective, client-facing AI solutions.

7. Seniority & Skillset Requirements

The ideal ML Researcher will have:

Deep expertise in AI/ML domains including agent frameworks, deep learning, NLP, computer vision, and emerging AI systems.

Proven experience designing benchmarks and evaluation datasets from the ground up.

Strong research and analytical skills with the ability to translate papers into practical solutions.

Excellent documentation and communication abilities.

Ability to work independently while collaborating effectively with cross-functional teams.

Interview Process:

Round	Focus	What is evaluated
Round 1	Project Deep Dive	Real project experience and technical understanding
Round 2	Execution & Technical Depth	Practical ML/LLM implementation ability
Round 3	Culture & Team Fit	Communication, mindset, and team compatibility

View all job openings

ML Researcher – Benchmarks & Evaluation