Girishkumar Ponkiya

Data Scientist (Vice President) · Applied AI & NLP

From Research to Production: Teaching Machines · Training Developers

Ph.D. · IIT Bombay ACL · EMNLP · COLING · LREC NLP & LLM Systems

About

Applied AI meets research rigour.

As an AI/ML specialist at ISS STOXX's Technology Innovation Lab, I work on LLM infrastructure and NLP pipelines — from deploying and optimising models on internal GPU infrastructure to building systems that process ~700K news articles daily across 10+ languages for ESG controversy monitoring.

My doctoral research at IIT Bombay on Noun Compound Interpretation addressed two complementary aspects: automatic interpretation — uncovering implicit semantic relations between nouns through supervised LSTM paraphrasing and unsupervised approaches with BERT/RoBERTa and T5 — both outperforming their respective supervised baselines — and a linguistic contribution proposing FrameNet-based semantic labels, demonstrating that FrameNet data can generalize label sets and enable models to handle unseen relation labels. This work was published at ACL, EMNLP, COLING, and LREC.

I also design and deliver a company-wide "Introduction to LLM for Software Developers" workshop series (~70 participants per session) and advise internal teams on AI use-case design and implementation, contribute to open source NLP tooling (Hugging Face Transformers), and have open-sourced a vLLM monitoring dashboard built on Telegraf, InfluxDB, and Grafana.

Experience

From research to production AI.

Data Scientist — ISS STOXX, Technology Innovation Lab

Mumbai, India · Mar 2021 – Present

Individual contributor; cross-functional AI advisor. Associate Vice President (Mar 2021 – Dec 2025) → Vice President (Jan 2026 – Present).

NLP & ESG pipelines: Built and maintain the NLP components of the news filtering pipeline (NER, entity linking, relevance scoring) — the pipeline processes ~700K articles/day across 10+ languages; relevance rate grew from ~2% to ~35% over 4 years.
LLM infrastructure: Deployed multiple LLMs on internal GPU infrastructure using vLLM with production-grade optimisations. Built and open-sourced the vllm-dashboard monitoring stack.
AI enablement: Delivering a company-wide "Introduction to LLM for Software Developers" workshop series (~70 participants/session); advising internal dev teams on AI use-case design and implementation.
Index products: Leading AI-driven thematic index and sentiment index initiatives for STOXX.

Technical contributions (NER · LLM · Knowledge Graphs)

Evaluated 5 NER tools against internal ESG data; integrated spaCy, CoreNLP, and Diffbot into the production NiFi pipeline.
Improved the pipeline's relevance classifier from 82%+ to 85%+ accuracy by replacing a spaCy-based component with a tuned scikit-learn model (feature engineering + hyperparameter search).
Contributed to Japanese-language expansion raising relevance from ~9% to ~33%.
Devised a novel threshold identification approach enabling mathematical confidence bounds on relevance and false positive rates — previously unmeasurable.
Redesigned entity linking around a Memgraph knowledge graph for better adaptability to dynamic entity universes.
Deployed LLMs (including Gemma3:27B) with FP8 quantization, kv-cache quantization, and multi-GPU tensor parallelism. Identified a CPU-level bottleneck (missing AVX2/AVX-512) leading to a hardware upgrade.
Built AI-driven document processing pipeline for thematic index: PDF → Markdown (Docling) → section filtering → LLM summarisation.
Developed aspect-based sentiment analysis for the AI-driven STOXX sentiment index using E3 APIs and Vespa-backed vector embeddings.

Machine Learning Consultant — UnFound

Mumbai, India · Dec 2018 – Jun 2019 · Part-time

Part-time engagement during PhD research, taken deliberately to gain industry exposure and bridge the transition from academia to applied ML.

QA pipeline: Re-defined question answering pipeline using BERT.
Stance detection: Developed a multi-task learning approach for stance detection.

Skills

Where theory meets tooling.

NLP & ML

Infrastructure & Systems

Frameworks & Languages

Education

Academic foundations.

Ph.D., Computer Science (Natural Language Processing)

Indian Institute of Technology Bombay (IIT Bombay), Mumbai

Thesis submitted: Mar 2021 · Degree awarded: 2023 · CPI: 7.7
Thesis: Noun Compound Interpretation. Supervisors: Prof. Pushpak Bhattacharyya (IIT Bombay) and Mr. Girish K. Palshikar (Tata Research Development and Design Centre (TRDDC), TCS Research).

Thesis arc: supervised LSTM paraphrasing → BERT/RoBERTa unsupervised MLM → T5 free-form generation. The unsupervised approaches outperformed prior supervised baselines without labelled data — the key finding of the thesis.

Best Poster (Technical), Research-Scholar Mela 2016, CSE Dept., IIT Bombay.

First Prize, poster session, RISC 2016.

Research details

The thesis addressed Noun Compound Interpretation (NCI): automatically uncovering implicit semantic relations between component nouns (e.g., student protest → "student is the doer of the protest"). The work spans two distinct axes:

1. Automatic interpretation: Multiple approaches of increasing sophistication — (a) supervised LSTM-based prepositional paraphrasing; (b) unsupervised fill-in-the-blank using BERT/RoBERTa; (c) unsupervised free-form paraphrasing using T5 — the unsupervised models outperformed prior supervised baselines.

2. Linguistic contribution: Proposed FrameNet-based semantic relation labels for noun compounds, and demonstrated that FrameNet data can be leveraged to generalize the label space — enabling interpretation models to handle unseen relation labels via zero-shot learning (ConvE embeddings over FrameNet).

Relevant courses: Foundation of ML, NLP and the Web, Artificial Intelligence, Topics in NLP (Machine Translation).

Research originated during a 2014 TRDDC internship. Internship mentor Girish K. Palshikar became Ph.D. co-supervisor — a direct intellectual lineage from internship to completed PhD.

M.Tech., Computer Science

Indian Statistical Institute (ISI), Kolkata · 2011–2013 · 78.54%

Dissertation: Priority Search Tree for Secondary Memory and its Application (Q-MER Problem). Supervisor: Prof. Subhas C. Nandy.

Dissertation nominated for the Best Dissertation Award.

Dissertation details

Addressed the Q-MER (Query Maximal Empty Rectangle) problem in external memory — finding the largest axis-parallel empty rectangle around a query point among massive point sets that exceed main memory.

Contributions: (1) Proposed algorithm for an external priority search tree with linear space in log-linear time; (2) Showed all standard range queries can be answered efficiently; (3) Algorithm to solve MER with O(log n) extra space in main memory.

Relevant topics: Probabilistic Reasoning · Pattern Recognition · Algorithms & Complexity · Optimisation Techniques · Automata & Formal Languages

B.E., Computer Engineering

Saurashtra University, Rajkot · 2005–2009

Projects: GridApps (web application for distributed batch processing on a grid) and Web-PC (J2ME + web application for remote PC access).

Web-PC selected in Top 10 at C2C Project Competition by the Government of Gujarat, TCS, and Microsoft.

Competitive Exams

GATE 2013 — 99.78 percentile · All India Rank 495

GATE 2011 — All India Rank 875 (M.Tech. admission at ISI Kolkata)

JEST 2013 — Rank 65

Publications

Peer-reviewed research.

ACL 2021Girishkumar Ponkiya, Diptesh Kanojia, Pushpak Bhattacharyya, Girish K. Palshikar. "FrameNet-assisted Noun Compound Interpretation." Findings of ACL-IJCNLP 2021.
EMNLP 2020Girishkumar Ponkiya, Rudra Murthy, Pushpak Bhattacharyya, Girish K. Palshikar. "Looking inside Noun Compounds: Unsupervised Prepositional and Free Paraphrasing using Language Models." Findings of EMNLP 2020.
COLING 2018Girishkumar Ponkiya, Kevin Patel, Pushpak Bhattacharyya, Girish K. Palshikar. "Treat us like the Sequences we are: Prepositional Paraphrasing of Noun Compounds using LSTM." COLING 2018.
LREC 2018Girishkumar Ponkiya, Kevin Patel, Pushpak Bhattacharyya, Girish K. Palshikar. "Towards a Standardized Dataset for Noun Compound Interpretation." LREC 2018.
ICON 2016Girishkumar Ponkiya, Pushpak Bhattacharyya, Girish K. Palshikar. "On Why Coarse Class Classification is a Bottleneck for Noun Compound Interpretation." ICON 2016.
arXiv 2021Siddhesh Pawar, Shyam Thombre, Anirudh Mittal, Girishkumar Ponkiya, Pushpak Bhattacharyya. "Tapping BERT for Preposition Sense Disambiguation." arXiv:2111.13972, 2021.

Google Scholar

Projects & Open Source

Building in the open.

vllm-dashboard

Open-source Telegraf → InfluxDB 2.0 → Grafana monitoring stack for vLLM inference servers. Monitors TTFT, throughput, KV-cache utilisation, HTTP responses, and Python GC.

Hugging Face Transformers

Bug reports (#4121, #4021), T5 pipeline solution (#3985), code review (PR #4367), and documentation commits.

Other contributions: keras_lr_finder · Streamlit · BetterBib · framenet-tools · ProbCog · NLP-progress (proposed noun compound task)