JD: Machine Learning Engineer
Location: Hyderabad
About Us:
Deccan AI, founded by IIT Bombay and IIM Ahmedabad alumni, specializes in LLM model development and AI-first scaled operations. Based in SF and Hyderabad, our mission is to create AI for Good, driving innovation with positive societal impact.
About the Role
We are seeking a Machine Learning Engineer focused on Data Quality to ensure our model training data meets the highest standards of reliability, relevance, and safety. This role plays a pivotal part in the ML lifecycle — from automated QA of training data to developing evaluation strategies and leading rater workflows — ensuring that the data shipped aligns closely with client expectations and model performance objectives.
You will be at the intersection of engineering, research, and client success, acting as the final quality gatekeeper for datasets powering LLM fine-tuning, reward modeling, and evaluation.
Key Responsibilities

Dataset Quality Automation
Automate quality assurance pipelines for SFT transcripts and RLHF preference pairs.

Implement schema validation, semantic overlap checks, and embedding-based deduplication.

Integrate filters for safety, toxicity, and reward-signal balance in datasets.

Training & Benchmarking
Execute proxy fine-tuning (LoRA/QLoRA) on open-source LLMs using QA-approved datasets.

Train lightweight reward models and track performance via public/internal benchmarks and calibration metrics.

LLM Evaluation
Orchestrate human and LLM-as-judge evaluations, including generation of critiques and scoring.

Design evaluation rubrics focused on consistency, helpfulness, and alignment with reward models.

Calculate and interpret statistical measures like binomial confidence intervals for evaluation scores.

Annotation & Rater Management
Build a continuous feedback loop with annotation teams, resolve disputes, and maintain high annotation quality.

Manage human evaluation workflows to maximize consistency and throughput.

Research & Tooling
Prototype new signal-to-noise metrics (e.g., reward model entropy, preference flip rate).

Package tooling into reproducible notebooks and integrate into CI pipelines (Airflow/Dagster).

End Value to the Company

You will serve as the client-end MLE advocate, ensuring that all training and evaluation datasets are aligned with downstream needs. Your work will directly influence model performance, client satisfaction, and data-driven improvements to our ML systems.
Required Skills & Qualifications

Strong understanding of LLM training and evaluation pipelines (SFT, RLHF, reward modeling).

Experience with model performance diagnostics, identifying root causes in model behavior (e.g., data flaws, prompt issues).

Skilled in prompt engineering, dataset schema design, and annotation guideline development.

Proficient in Python, with experience using PyTorch, Hugging Face Transformers, FastAPI.

Comfortable building evaluation frameworks, including leaderboards and domain-specific test sets.

Familiarity with model evaluation metrics, clustering techniques, embedding models, and data drift detection.

Strong communication skills, especially in translating technical findings into actionable client insights.

Self-starter with a consultative mindset who can operate across technical and business domains.

Nice-to-Have

Experience with embedding similarity, data deduplication, or dataset filtering for toxicity/safety.

Prior work in LLM-as-a-Judge systems or human alignment evaluations.

Familiarity with CI/CD for data workflows and orchestration tools like Airflow or Dagster.