Girishkumar Ponkiya
Girishkumar Ponkiya
Data Scientist (Vice President) · Applied AI & NLP
From Research to Production: Teaching Machines · Training Developers
Ph.D. · IIT Bombay ACL · EMNLP · COLING · LREC NLP & LLM Systems
Applied AI meets research rigour.

As an AI/ML specialist at ISS STOXX's Technology Innovation Lab, I work on LLM infrastructure and NLP pipelines — from deploying and optimising models on internal GPU infrastructure to building systems that process ~700K news articles daily across 10+ languages for ESG controversy monitoring.

My doctoral research at IIT Bombay on Noun Compound Interpretation addressed two complementary aspects: automatic interpretation — uncovering implicit semantic relations between nouns through supervised LSTM paraphrasing and unsupervised approaches with BERT/RoBERTa and T5 — both outperforming their respective supervised baselines — and a linguistic contribution proposing FrameNet-based semantic labels, demonstrating that FrameNet data can generalize label sets and enable models to handle unseen relation labels. This work was published at ACL, EMNLP, COLING, and LREC.

I also design and deliver a company-wide "Introduction to LLM for Software Developers" workshop series (~70 participants per session) and advise internal teams on AI use-case design and implementation, contribute to open source NLP tooling (Hugging Face Transformers), and have open-sourced a vLLM monitoring dashboard built on Telegraf, InfluxDB, and Grafana.

From research to production AI.
Data Scientist — ISS STOXX, Technology Innovation Lab
Mumbai, India · Mar 2021 – Present
Individual contributor; cross-functional AI advisor. Associate Vice President (Mar 2021 – Dec 2025) → Vice President (Jan 2026 – Present).
NLP & ESG pipelines: Built and maintain the NLP components of the news filtering pipeline (NER, entity linking, relevance scoring) — the pipeline processes ~700K articles/day across 10+ languages; relevance rate grew from ~2% to ~35% over 4 years.
LLM infrastructure: Deployed multiple LLMs on internal GPU infrastructure using vLLM with production-grade optimisations. Built and open-sourced the vllm-dashboard monitoring stack.
AI enablement: Delivering a company-wide "Introduction to LLM for Software Developers" workshop series (~70 participants/session); advising internal dev teams on AI use-case design and implementation.
Index products: Leading AI-driven thematic index and sentiment index initiatives for STOXX.
Technical contributions (NER · LLM · Knowledge Graphs)
  • Evaluated 5 NER tools against internal ESG data; integrated spaCy, CoreNLP, and Diffbot into the production NiFi pipeline.
  • Improved the pipeline's relevance classifier from 82%+ to 85%+ accuracy by replacing a spaCy-based component with a tuned scikit-learn model (feature engineering + hyperparameter search).
  • Contributed to Japanese-language expansion raising relevance from ~9% to ~33%.
  • Devised a novel threshold identification approach enabling mathematical confidence bounds on relevance and false positive rates — previously unmeasurable.
  • Redesigned entity linking around a Memgraph knowledge graph for better adaptability to dynamic entity universes.
  • Deployed LLMs (including Gemma3:27B) with FP8 quantization, kv-cache quantization, and multi-GPU tensor parallelism. Identified a CPU-level bottleneck (missing AVX2/AVX-512) leading to a hardware upgrade.
  • Built AI-driven document processing pipeline for thematic index: PDF → Markdown (Docling) → section filtering → LLM summarisation.
  • Developed aspect-based sentiment analysis for the AI-driven STOXX sentiment index using E3 APIs and Vespa-backed vector embeddings.
Machine Learning Consultant — UnFound
Mumbai, India · Dec 2018 – Jun 2019 · Part-time
Part-time engagement during PhD research, taken deliberately to gain industry exposure and bridge the transition from academia to applied ML.
QA pipeline: Re-defined question answering pipeline using BERT.
Stance detection: Developed a multi-task learning approach for stance detection.
Where theory meets tooling.
NLP & ML
NER & entity linking RAG pipelines Knowledge graphs Sentiment analysis Text classification Noun compound interpretation
Infrastructure & Systems
vLLM Grafana / Telegraf / InfluxDB Kubernetes Apache NiFi Vespa Docling
Frameworks & Languages
Python PyTorch Hugging Face Transformers spaCy CoreNLP Diffbot scikit-learn Java SQL
Academic foundations.
Ph.D., Computer Science (Natural Language Processing)
Indian Institute of Technology Bombay (IIT Bombay), Mumbai
Thesis submitted: Mar 2021 · Degree awarded: 2023 · CPI: 7.7
Thesis: Noun Compound Interpretation. Supervisors: Prof. Pushpak Bhattacharyya (IIT Bombay) and Mr. Girish K. Palshikar (Tata Research Development and Design Centre (TRDDC), TCS Research).
Thesis arc: supervised LSTM paraphrasingBERT/RoBERTa unsupervised MLMT5 free-form generation. The unsupervised approaches outperformed prior supervised baselines without labelled data — the key finding of the thesis.
Best Poster (Technical), Research-Scholar Mela 2016, CSE Dept., IIT Bombay.
First Prize, poster session, RISC 2016.
Research details

The thesis addressed Noun Compound Interpretation (NCI): automatically uncovering implicit semantic relations between component nouns (e.g., student protest → "student is the doer of the protest"). The work spans two distinct axes:

1. Automatic interpretation: Multiple approaches of increasing sophistication — (a) supervised LSTM-based prepositional paraphrasing; (b) unsupervised fill-in-the-blank using BERT/RoBERTa; (c) unsupervised free-form paraphrasing using T5 — the unsupervised models outperformed prior supervised baselines.

2. Linguistic contribution: Proposed FrameNet-based semantic relation labels for noun compounds, and demonstrated that FrameNet data can be leveraged to generalize the label space — enabling interpretation models to handle unseen relation labels via zero-shot learning (ConvE embeddings over FrameNet).

Relevant courses: Foundation of ML, NLP and the Web, Artificial Intelligence, Topics in NLP (Machine Translation).

Research originated during a 2014 TRDDC internship. Internship mentor Girish K. Palshikar became Ph.D. co-supervisor — a direct intellectual lineage from internship to completed PhD.

M.Tech., Computer Science
Indian Statistical Institute (ISI), Kolkata · 2011–2013 · 78.54%
Dissertation: Priority Search Tree for Secondary Memory and its Application (Q-MER Problem). Supervisor: Prof. Subhas C. Nandy.
Dissertation nominated for the Best Dissertation Award.
Dissertation details

Addressed the Q-MER (Query Maximal Empty Rectangle) problem in external memory — finding the largest axis-parallel empty rectangle around a query point among massive point sets that exceed main memory.

Contributions: (1) Proposed algorithm for an external priority search tree with linear space in log-linear time; (2) Showed all standard range queries can be answered efficiently; (3) Algorithm to solve MER with O(log n) extra space in main memory.

Relevant topics: Probabilistic Reasoning · Pattern Recognition · Algorithms & Complexity · Optimisation Techniques · Automata & Formal Languages

B.E., Computer Engineering
Saurashtra University, Rajkot · 2005–2009
Projects: GridApps (web application for distributed batch processing on a grid) and Web-PC (J2ME + web application for remote PC access).
Web-PC selected in Top 10 at C2C Project Competition by the Government of Gujarat, TCS, and Microsoft.
Competitive Exams
GATE 2013 — 99.78 percentile · All India Rank 495
GATE 2011 — All India Rank 875 (M.Tech. admission at ISI Kolkata)
JEST 2013 — Rank 65
Peer-reviewed research.
  • ACL 2021Girishkumar Ponkiya, Diptesh Kanojia, Pushpak Bhattacharyya, Girish K. Palshikar. "FrameNet-assisted Noun Compound Interpretation." Findings of ACL-IJCNLP 2021.
  • EMNLP 2020Girishkumar Ponkiya, Rudra Murthy, Pushpak Bhattacharyya, Girish K. Palshikar. "Looking inside Noun Compounds: Unsupervised Prepositional and Free Paraphrasing using Language Models." Findings of EMNLP 2020.
  • COLING 2018Girishkumar Ponkiya, Kevin Patel, Pushpak Bhattacharyya, Girish K. Palshikar. "Treat us like the Sequences we are: Prepositional Paraphrasing of Noun Compounds using LSTM." COLING 2018.
  • LREC 2018Girishkumar Ponkiya, Kevin Patel, Pushpak Bhattacharyya, Girish K. Palshikar. "Towards a Standardized Dataset for Noun Compound Interpretation." LREC 2018.
  • ICON 2016Girishkumar Ponkiya, Pushpak Bhattacharyya, Girish K. Palshikar. "On Why Coarse Class Classification is a Bottleneck for Noun Compound Interpretation." ICON 2016.
  • arXiv 2021Siddhesh Pawar, Shyam Thombre, Anirudh Mittal, Girishkumar Ponkiya, Pushpak Bhattacharyya. "Tapping BERT for Preposition Sense Disambiguation." arXiv:2111.13972, 2021.
Building in the open.

Open-source Telegraf → InfluxDB 2.0 → Grafana monitoring stack for vLLM inference servers. Monitors TTFT, throughput, KV-cache utilisation, HTTP responses, and Python GC.

Hugging Face Transformers

Bug reports (#4121, #4021), T5 pipeline solution (#3985), code review (PR #4367), and documentation commits.