Projects

bg 05 free img
protein

Machine learning workflows for classifying structural protein sequences using NLP, LSTM, and pre-trained LLM embeddings to support drug discovery.

  • Lightweight, GPU-friendly workflows
  • Methods balancing computational efficiency with biological relevance
  • Comparisons between NLP, LSTM, and LLM-based approaches
  • Exploratory data analysis (EDA) of protein sequences from the Protein Data Bank (PDB)
thumbnail

Automated analysis of emerging biotech trends using AI, NLP, and web-based data aggregation, filtering, and visualization to provide actionable insights.

  • An intelligence platform showcasing automated trend extraction from RSS feeds across the biotech ecosystem.
  • Interactive dashboard allowing exploration of recent trends.
  • Submitted as a Concierge Agent for Google AI Agents Intensive Capstone Project.
bg 03 free img

Interactive dashboard that predicts property type (Detached, Attached, Condo) from MLS CSV data using gradient boosting models and Streamlit.

  • Machine learning project completed during an externship with Berkshire Hathaway HomeServices.
  • Data-driven dashboard to classify real estate listings by property type (detached, attached, or condo) using MLS data.
map

Interactive route planner to optimize park-hopping itineraries — compute efficient multi-stop sequences for theme parks, vacations, or travel days.

  • Route optimization algorithm for a summer vacation roadtrip created in collaboration with Software Engineers.
  • Interactive website to plan a route for a summer roadtrip through national parks and provide helpful travel tips.
ngs

Pipelines, analyses, and training resources that I’ve developed to streamline genomic data analysis. Many projects stem from the Genomic Data Science Specialization (Johns Hopkins University – Coursera), along with hands-on work inspired by my experience in a molecular diagnostics laboratory.

  • Portfolio of bioinformatics and NGS analysis projects, including end-to-end pipelines and variant analysis workflows developed through formal training and hands-on practice.

  • Practical NGS data processing skills such as alignment (Bowtie2/BWA), variant calling (GATK, bcftools/samtools), quality control, and statistical summarization with Python and R.

  • Workflow automation and reproducibility tools (Snakemake, bash scripting) with documentation and example scripts to support reproducible genomic data analyses.

bg 05 free img

Machine learning and data science projects from the TripleTen Data Science Professional Training Program.

 
  • Showcases a comprehensive set of data science projects completed as part of the TripleTen professional training program, spanning data cleaning, exploratory analysis, hypothesis testing, and SQL-based data handling.

  • Includes progressively advanced machine learning applications such as classification, regression, time series forecasting, NLP sentiment analysis, and computer vision models using industry-standard Python libraries.

  • Demonstrates practical deployment and visualization skills with tools like Streamlit dashboards and interactive plots, highlighting ability to communicate insights and build end-to-end analytic solutions.

Add Your Heading Text Here

109313546 gettyimages 1040300740

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

Bullet list

  • Item 1
  • Item 2
  • Item 3

Add Your Heading Text Here

23094820205067

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

Bullet list

  • Item 1
  • Item 2
  • Item 3

Add Your Heading Text Here

109313546 gettyimages 1040300740

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

Bullet list

  • Item 1
  • Item 2
  • Item 3

Publications