Projects
An autonomous AI agent that loads a PDB-derived protein dataset, engineers features, trains a Random Forest classifier, evaluates its performance, and produces a full scientific report — all driven by GPT-4o function calling inside a LangGraph pipeline. The trained model is saved to disk and served via a Gradio prediction UI.
- Compared classical ML models vs. agentic AI workflows for protein classification in drug discovery
- Built an end-to-end pipeline using biological sequence data for training and evaluation
- Explored agentic systems for improving adaptability and decision-making in bioinformatics workflows
Machine learning workflows for classifying structural protein sequences using NLP, LSTM, and pre-trained LLM embeddings to support drug discovery.
- Lightweight, GPU-friendly workflows
- Methods balancing computational efficiency with biological relevance
- Comparisons between NLP, LSTM, and LLM-based approaches
- Exploratory data analysis (EDA) of protein sequences from the Protein Data Bank (PDB)
Automated analysis of emerging biotech trends using AI, NLP, and web-based data aggregation, filtering, and visualization to provide actionable insights.
- An intelligence platform showcasing automated trend extraction from RSS feeds across the biotech ecosystem.
- Interactive dashboard allowing exploration of recent trends.
- Submitted as a Concierge Agent for Google AI Agents Intensive Capstone Project.
Interactive dashboard that predicts property type (Detached, Attached, Condo) from MLS CSV data using gradient boosting models and Streamlit.
- Machine learning project completed during an externship with Berkshire Hathaway HomeServices.
- Data-driven dashboard to classify real estate listings by property type (detached, attached, or condo) using MLS data.
Interactive route planner to optimize park-hopping itineraries — compute efficient multi-stop sequences for theme parks, vacations, or travel days.
- Route optimization algorithm for a summer vacation roadtrip created in collaboration with Software Engineers.
- Interactive website to plan a route for a summer roadtrip through national parks and provide helpful travel tips.
Pipelines, analyses, and training resources that I’ve developed to streamline genomic data analysis. Many projects stem from the Genomic Data Science Specialization (Johns Hopkins University – Coursera), along with hands-on work inspired by my experience in a molecular diagnostics laboratory.
Portfolio of bioinformatics and NGS analysis projects, including end-to-end pipelines and variant analysis workflows developed through formal training and hands-on practice.
Practical NGS data processing skills such as alignment (Bowtie2/BWA), variant calling (GATK, bcftools/samtools), quality control, and statistical summarization with Python and R.
Workflow automation and reproducibility tools (Snakemake, bash scripting) with documentation and example scripts to support reproducible genomic data analyses.
Machine learning and data science projects from the TripleTen Data Science Professional Training Program.
Showcases a comprehensive set of data science projects completed as part of the TripleTen professional training program, spanning data cleaning, exploratory analysis, hypothesis testing, and SQL-based data handling.
Includes progressively advanced machine learning applications such as classification, regression, time series forecasting, NLP sentiment analysis, and computer vision models using industry-standard Python libraries.
Demonstrates practical deployment and visualization skills with tools like Streamlit dashboards and interactive plots, highlighting ability to communicate insights and build end-to-end analytic solutions.
Add Your Heading Text Here
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.
Bullet list
- Item 1
- Item 2
- Item 3
Add Your Heading Text Here
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.
Bullet list
- Item 1
- Item 2
- Item 3
