Hello, I'm

Dmitrii Kuzmin

NLP Engineer crafting efficient tooling for large language models.

Something went wrong

About Me

NLP / ML engineer and researcher focused on tokenizer adaptation, LLM fine-tuning, and evaluation.

I iterate quickly on open-source model stacks, building efficient training and inference workflows for multilingual use-cases. I enjoy collaborating with cross-functional teams and translating research insights into production impact.

Something went wrong

My Skills

Balanced toolkit across research, engineering, and deployment for large language models.

NLP

  • PyTorch, Transformers, Tokenizers
  • LangChain, custom inference tooling
  • Dataset curation & evaluation pipelines

DevOps

  • Git & collaborative workflows
  • Dockerized training/inference stacks
  • GPU optimization & monitoring

Back-end

  • MongoDB data modeling
  • Telegram API integrations
  • RESTful service design

Languages

  • English (working proficiency)
  • Russian (native)

Soft Skills

  • Flexibility & ownership
  • Responsible delivery
  • Team enthusiasm & knowledge sharing

Experience

Hands-on roles delivering large language model research, productionization, and tooling.

Research Intern · Mohamed bin Zayed University of Artificial Intelligence

Abu Dhabi, United Arab Emirates · Jun 2025 – present
  • Designing alternative tokenization strategies to boost LLM inference quality and efficiency.
  • Co-authoring an academic paper on tokenizer-driven performance gains for multilingual models.

Middle NLP Engineer · DeepPavlov

Moscow, Russia · May 2025 – present
  • Running R&D initiatives and evaluation workflows.
  • Building testing harnesses for instruction-tuned models.
  • Testing inference pipelines on domestic GPUs from Chinese manufacturers.

Middle NLP Engineer · Center for Applied AI, Skolkovo

Moscow, Russia · Feb 2025 – May 2025
  • Fine-tuned the Qwen2.5-VL model for multimodal requirement analysis.
  • Implemented an end-to-end pipeline.
  • Designed prompting strategies to generate actionable feedback on heterogeneous specifications.

NLP Researcher · Higher School of Economics

Moscow, Russia · Jun 2024 – May 2025
  • Fine-tuned Llama 3 8B Instruct for Russian-language generation tasks.
  • Developed a Russian BPE tokenizer and tooling to manipulate existing vocabularies safely.
  • Built a grammar benchmark suite to quantify improvements across downstream tasks.

ML Engineer & Backend Engineer · Moscow Aviation Institute

Moscow, Russia · Jul 2023 – Oct 2023
  • Delivered a sentence-theme classification model for aviation-specific communications.
  • Optimized MongoDB queries powering reporting dashboards.
  • Integrated Telegram API services to streamline data ingestion.

NLP Engineer · Innopolis University

Innopolis, Russia · Jun 2023 – Jul 2023
  • Built a deep learning pipeline for sentiment analysis on YouTube comments.
  • Fine-tuned BERT models to improve classification accuracy on imbalanced datasets.

My Projects

Publications

Researching tokenizer adaptation and cost-efficient strategies for large language models.

Rethinking Tokenization: Improving Language Model Performance by Modifying the Input Module at Inference

EACL 2026 · Researcher & Writer · Jun 2025 – present · Under review

Investigates how alternative tokenizations of the same text impact LLM inference quality.

TokenSubstitution: Cost-efficient Method for LLM Adaptation for the Russian Language

ACL 2026 · Researcher & Writer · Feb 2025 – present · In progress

Proposes cost-effective adaptation approach for improving the performance of LLM generation in target language.

A Multi-Aspect Evaluation of Tokenizer Adaptation Methods for Large Language Models

Russian AI Journey 2025 · Researcher & Writer · Jun 2024 – Jun 2025 · Accepted

Demonstrates tokenizer adaptation as a cost-effective technique by analyzing text quality and token efficiency across diverse benchmarks.

Education

Grounded in data analysis, software systems, and human-centered AI design.

B.S. in Data Analysis and Artificial Intelligence

Innopolis University · 2022 – 2026
  • Core courses: Software Systems Analysis & Design, Human-AI Interaction Design, Mathematical Analysis.
  • Focus on applied machine learning, large language models, and productized AI systems.

Extracurricular Activity

Tutor · Innopolis University · Sep 2023 – Jan 2024
  • Helped first-year students adapt to academic and cultural workflows.
  • Organized extracurricular events to foster community and peer learning.

Contact Me

Let’s collaborate on tokenizer research, LLM evaluation, or productizing AI workflows.

Created by Dmitrii Kuzmin