Publications

REFINESTAT: Efficient Exploration for Probabilistic Program Synthesis

Authors: Madhav Kanda, Shubham Ugare, Sasa Misailovic

Probabilistic programming offers a powerful framework for modeling uncertainty, yet statistical model discovery in this domain entails navigating an immense search space under strict domain-specific constraints. When small language models are tasked with generating probabilistic programs, they frequently produce outputs that suffer from both syntactic, and semantic errors, such as flawed inference constructs. Motivated by probabilistic programmers’ domain expertise and debugging strategies, we introduce REFINESTAT, a language model–driven framework that enforces semantic constraints ensuring synthesized programs contain valid distributions, well-formed parameters, and then applies diagnostic-aware refinement by resampling prior or likelihood components whenever reliability checks fail. We evaluate REFINESTAT on multiple probabilistic-programming code-generation tasks using smaller language models (SLMs) and find that it produces programs that are both syntactically sound and statistically reliable, often matching or surpassing those from closed-source large language models (e.g., OpenAI o3).

View Publication

Towards Scalable Identification of Brick Kilns from Satellite Imagery with Active Learning

Authors: Madhav Kanda, Aditi Agarwal, Nipun Batra

Air pollution is a major global issue, worsened by unregulated brick production. Traditional kiln detection is slow, but AI models help by reducing labeling costs. Using active learning, we identified 700+ kilns in India and deployed a web tool for automatic detection.

View Publication

SpiroActive: Active Learning for Efficient Data Acquisition for Spirometry

Authors: Ankita Jain, Madhav Kanda, Nipun Batra

Respiratory illnesses, especially COPD, are a major health burden, causing 3.23M deaths in 2019. Spirometry aids diagnosis but is costly and inaccessible. Wearable spirometry, enhanced by active learning, reduces data collection needs while maintaining model accuracy.

View Publication