-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Hi @bsureshkrishna ,
My name is Syed Muhammad Hussain, a Computer Science graduate from Habib University, currently working as an ML Engineer at BEAM AI.
I've gone through the project description in detail and I'm very interested in contributing to AStats for GSoC 2026. Before jumping into code, I spent time researching the current landscape, existing tools like JASP, Jamovi, and SPSS (menu-driven, rigid), as well as LLM-powered alternatives like PandasAI, LAMBDA, and DeepAnalyze. A key gap I identified: none of these tools prioritize statistical correctness. They either expect the user to know which test to pick, or they generate code without checking assumptions. A recent survey ("A Survey on LLM-based Agents for Statistics and Data Science", arXiv:2412.14222) confirms this, current LLM agents frequently apply inappropriate tests to non-normal distributions.
Based on this research, I've put together a Proof of Concept proposal document that covers:
- My understanding of the problem and where existing tools fall short
- PoC scope — a working LangChain-based ReAct agent that loads any tabular dataset, checks assumptions (normality, variance, sample size) before every test, automatically selects the appropriate parametric or non-parametric method, and reports results with effect sizes, interpretations, and warnings
- Decision flow — how the agent reasons through: profile → plan → check assumptions → select test → execute → interpret → suggest next steps
- Tech stack — Python, LangChain, Ollama (open-weight models), scipy/statsmodels
- What's in scope vs. out of scope for the PoC vs. the full GSoC project
- An example session showing expected agent output
- Questions for mentors — I have a few specific questions on scope, statistical priorities, and framework choice that I'd appreciate your input on
PoC Proposal Document: https://docs.google.com/document/d/1S1ydXT9GtlE1txFv_YxKixLsvj-Y0X4Dmm9d-PwJ970/edit?usp=sharing
I'd appreciate any feedback on the direction, scope, or priorities before I start building. Happy to adjust the approach based on your guidance.
Thanks for your time!
Syed Muhammad Hussain
Machine Learning Engineer, Beam AI
Research Intern, Empathic Computing Lab – University of Auckland
B.Sc. Computer Scientist, Habib University
LinkedIn | GitHub | Google Scholar