Ryan Burns

Headshot

Data Scientist with Healthcare domain expertise specializing in ML and AI.

MS in Data Science, Artificial Intelligence specialization, Northwestern University '22.

View My LinkedIn Profile

View My GitHub Profile

Projects | Resume

Project Portfolio

Certilytics Projects
GenAI Patient Healthcare Pathway Modeling


Impact: Developed and productionalized custom transformer architectures that achieved high-precision denial predictions, streamlining claims processing and improving financial efficiency.

To address the operational challenge of health insurance claim denials, I designed custom transformer-based neural network architectures. Unlike off-the-shelf solutions, these models were specifically tailored to capture the intricate relationships within complex claims data, resulting in high-performance metrics for both binary (denial vs. approval) and multi-class (denial reason) predictions.

I led the end-to-end deployment of these models into production for daily batch inference. The implementation utilized:

  • MLFlow: For comprehensive model lifecycle management, including experiment tracking and versioning.
  • AWS: For building a scalable and reliable cloud infrastructure.
  • Operational Excellence: Empowering organizations to proactively address denials and optimize patient experiences through automated insights.


Health Insurance Claims Denial Neural Network


Impact: Productionalized custom transformer architectures that achieved high-precision denial predictions, directly optimizing financial recovery and claims processing efficiency.

To tackle the financial strain caused by unpredictable health insurance claim denials, I designed custom transformer-based neural network architectures tailored to the non-linear complexities of medical billing data. While standard models often struggle with the multi-dimensional relationships in claims, these specific architectures were engineered to identify potential denials and their underlying reasons before they impacted the bottom line.

I led the end-to-end deployment of these models for daily batch inference, ensuring a scalable and reliable production environment. The implementation featured:

  • Advanced Architecture: Developed specialized transformers to capture intricate data relationships that off-the-shelf solutions missed.
  • MLOps Excellence: Utilized MLFlow for comprehensive model lifecycle management—including experiment tracking and versioning—and AWS for cloud-scale infrastructure.
  • Operational Value: By automating the identification of both binary and multi-class denial outcomes, the system enables proactive intervention and significantly reduces manual review time.


Word Embedding Hyperparameter Tuning


Impact: Optimized the core medical data representation layer for the entire model suite, driving a 5% average performance lift across all downstream classification and regression tasks.

Following the acquisition of new patient-level medical utilization data, I led the retraining and optimization of the foundational word embeddings that power the Certilytics ecosystem. The objective was to identify the optimal hyperparameter configurations to represent complex medical sequences more effectively than baseline models.

To ensure the new embeddings were both statistically superior and clinically sound, I implemented a rigorous multi-stage validation process:

  • Systematic Optimization: Executed a hyperparameter grid search across 22 distinct model configurations to isolate the settings that best captured medical utilization patterns.
  • Extrinsic Evaluation: Developed a robust testing framework that involved training four end-to-end models for every embedding candidate across four different pipelines, ensuring that improvements in the latent space translated to real-world predictive power.
  • Intrinsic Clinical Validation: Partnered with internal clinical experts to conduct a custom clustering challenge, verifying that the embeddings accurately reflected medical logic and established relationships between codes.
  • Enterprise Integration: Successfully deployed the final configuration, which resulted in a 5% improvement in AUC and R² scores and now serves as the central embedding layer for the majority of production pipelines.


AIR Communities Projects
Future Lease Projection Application


Impact: Replaced a slow, single-property Excel process with a high-performance Python application, enabling multi-scenario portfolio forecasting and reducing calculation time from hours to seconds.

Situation: The existing Excel-based projection tool was unable to handle unit-level data for the full ~100 property portfolio, with a runtime of nearly an hour per property.

Task: I was approached to engineer a scalable solution to create budget and forecasting projection scenarios across the entire portfolio independently.

Action:

  • Developed a Python-based engine to handle complex, unit-level calculations that previously overwhelmed spreadsheet software.
  • Created a custom GUI using Tkinter and bundled the program into an executable via PyInstaller, allowing non-technical staff to run scenarios on-demand.
  • Integrated output directly into SQL tables, streamlining data consumption for the Decision Support team.

Result: Transformed the forecasting process into a "disruptive innovation" that expanded the forecast horizon and allowed the organization to analyze a significantly higher volume of financial scenarios with precision.


CoStar Property Data Scraping


Impact: Engineered a high-velocity data acquisition pipeline to collect 34,000+ data points across 850+ competitor properties, reducing market research cycles from weeks to under eight minutes.

To facilitate deep competitive analysis across a ~100 property portfolio, I developed an automated scraping and ETL toolkit designed to extract 40+ attributes for over 850 multi-family apartment homes. The objective was to replace a fragmented manual research process with a unified, high-speed program capable of cleansing and injecting data directly into internal storage.

The resulting architecture achieved end-to-end execution—from raw collection to validated injection—in less than eight minutes. Key technical highlights include:

  • Performance Optimization: Engineered request logic to handle high-volume data extraction with minimal latency and high reliability.
  • Automated Data Cleansing: Built-in validation layers ensured that all 34,000+ data points were cleansed and formatted for immediate consumption by the analytics team.
  • Strategic Compliance: After CoStar updated their terms of use to restrict scraping, I proactively led the project in a new direction to acquire similar data via alternate channels, ensuring the business remained in total compliance while maintaining the analytical advantage.
View on GitHub CodeFactor Repo Grade


AutoML Demo with DataRobot


Impact: Automated the lead prioritization workflow by integrating DataRobot AutoML, transforming unstructured chatbot data into a ranked follow-up queue that prioritizes high-intent prospects.

To showcase the efficiency of automated machine learning in sales operations, I built a proof-of-concept and technical tutorial utilizing DataRobot. The project addresses the operational challenge of high-volume chatbot interactions by automatically scoring sentiment and intent, allowing outreach teams to bypass manual lead triaging and focus on the most promising conversions.

The implementation and demonstration highlighted:

  • Rapid Prototyping: Leveraged DataRobot’s AutoML engine to rapidly iterate through diverse model architectures and feature sets, significantly reducing the time-to-value for a production-ready sentiment classifier.
  • Sentiment-Driven Prioritization: Developed a logic layer that processes daily message logs to rank prospect follow-ups based on real-time linguistic cues and engagement intent.
  • End-to-End Visibility: Produced a comprehensive video demonstration and technical guide to show how AutoML can be integrated into existing sales stacks to drive measurable outreach efficiency.

View on GitHub CodeFactor Repo Grade


Personal Projects
NLP Miniature BERT Model Case Study


This project is a case study on developing NLP applications in a low-resource corporate environment operating a client-centric, service-based business model. I pretrained miniature BERT masked language models on domain-adapted vocabulary sourced from client-facing research documents. I demonstrated light improvements in model performance over baseline when finetuned to categorize client consultation requests by topic.


View on GitHub CodeFactor Repo Grade

Recording and Transcription Web Scraping Toolkit


I developed this toolkit to automate the collection of video recordings, recording metadata, and transcripts from a variety of different video conference, video hosting, and transcription service platforms. I personally utilized the tools during my four years working in client relationship management remotely supporting a territory containing hundreds of clients.


View on GitHub CodeFactor Repo Grade

Biweekly Sales Reports Automation


As the lucky husband to the founder of The Beverly Collective, a Colorado-based art collective, I built this program to reduce the manual workload of sending out biweekly sales reports emails to the 30+ artists and makers vending through the collective. I completed coding for this program in less than 5 hours and reduced the hourly workload from 10 hours per month to only 2 hours focused on email validation, payment processing, and vendor support each month. I successfully leveraged the Gmail API to gather user permissions and create email drafts within the user email and consumed Excel files into the Python-based program using the OpenPyxl library.


View on GitHub CodeFactor Repo Grade

Continuous Glucose Monitor Modeling


This project is designed to extend personal diabetes data and insights into the realm of real-time streaming, IoT integrations, and data science predictive modeling techniques. The project is launched from a foundation of diabetes data democratization facilitated by Nightscout, an open-source cloud application used by people with diabetes, providers, and caretakers to visualize, store and share the data from their Continuous Glucose Monitoring sensors in real-time.

Having recently established sensor data accessibility via a web-hosted MongoDB database, I am actively pursuing two aims with this project:

  1. Extend the data availability onto IoT technologies visualizing current blood glucose levels and directional trends.
  2. Apply cutting edge ML/AI modeling techniques to train novel predictive algorithms and compare performance to current industry standards.

View on GitHub CodeFactor Repo Grade

PDGA Prediction Modeling Utility Scripts


I co-authored a blog series hosted on Ultiworld Disc Golf predicting disc golf player performance at elite series events. I contributed player performance web scraping and GIS data collection capabilities, cleaned and preprocessed data, and edited post content. The scripts hosted in this repository demonstrate some of the larger data collection efforts feeding parts of the model. This was my first time ever using Python, and I am in the process of revisiting the files to spruce up the content. The blog posts are available on the Ultiworld Disc Golf website.


View on GitHub CodeFactor Repo Grade