I'm Harshi Gupta, a second-year B.Tech student in AI & Data Science at VIPS-TC, Delhi (CGPA: 9.16). I build applied ML systems: tabular data, predictive modeling, and end-to-end pipelines from training to deployed API. My work includes a mental health prediction system using TabNet (91.67% accuracy, 0.986 AUC), accepted at ICDAM 2026 (Springer LNNS, Scopus Indexed), and a book chapter on algorithmic fairness in AI. Competition highlights: Top 10 at IIT Roorkee's E-Summit, Round 2 at EY Techathon 6.0 (1.85L+ registrations) and YUVAi 2026 (2,400+ global teams). Secretary of CODEX. Open source contributor. Currently seeking ML research internships.
0 + Projects completed
AI & Data Science undergraduate (CGPA: 9.16) specializing in tabular ML, predictive modeling, and production deployment. Research accepted at ICDAM 2026 (Springer LNNS, Scopus Indexed): 91.67% accuracy and 0.986 AUC via TabNet. Shortlisted at EY Techathon 6.0 (1.85L+ registrations) and YUVAi 2026 (2,400+ global teams). Seeking ML research internships.
CGPA:9.16
Grade(Class XII): 81.2%
Grade(Class X): 89.7%
Below are the sample Data science projects on Pandas Numpy Matplotlib, Seaborn & Scikit-Learn .
Built an ML Model to classify mental health risk in students using academic and demographic features: CGPA, depression indicators, treatment history, and year of study. Benchmarked XGBoost, LightGBM, CatBoost, and TabNet; TabNet achieved best performance with 91.67% accuracy and 0.986 AUC. Research under review at ICDAM 2026, Springer LNNS (Scopus-indexed).
AutoWorth AI is an end-to-end machine learning application that predicts used car prices from 426,000+ real-world Craigslist listings. Built a full scikit-learn preprocessing pipeline, trained Random Forest and Linear Regression models achieving R² of 0.87 and MAE of $2.7K, and deployed a production FastAPI backend with a live public REST API - accessible via Streamlit frontend on the web.
Processed and explored a Netflix dataset containing 8,800+ movies and TV shows across 12+ attributes, reducing missing values by 20 to 25% through data cleaning, type conversion, and feature engineering using Pandas.Derived content insights using Matplotlib and Seaborn, showing that 70%+ of titles are Movies, the US contributes 30% of total content, and post-2015 releases account for over 55% of the catalog.
Processed the Titanic dataset (891 passengers) using Pandas, handling missing values in Age and Embarked columns and preparing categorical features, improving overall data usability by 15%. Visualized survival patterns with Matplotlib/Seaborn, showing 74% survival for females vs 19% for males and higher survival rates for 1st-class passengers compared to lower classes
Analyzed 200+ order records across 50+ customers and 30+ products, performing data cleaning, feature engineering, and revenue analysis using Python (Pandas, Matplotlib) to compute Customer Lifetime Value (CLV), top-10 customers, and monthly sales trends.Identified the highest revenue-generating product category and analyzed repeat vs one-time customer behavior, supported by category-wise, time-based, and customer-distribution visualizations.
Below are the details to reach out to me!
Delhi, India