Ryan Burns

Headshot

Data Scientist at Certilytics

MS in Data Science, Artificial Intelligence specialization completed through Northwestern University in June 2022.

View My LinkedIn Profile

View My GitHub Profile

Projects | Resume

Project Portfolio

Certilytics Projects
Word Embedding Hyperparameter Tuning


I retrained a word embedding deep representation model on newly-acquired data containing patient-level medical system utilization sequences and applied hyperparameter tuning testing and analysis. I tested a hyperparameter grid by generating 22 model configurations, and the optimal hyperparameter setting ultimately selected resulted in an average improvement of 5% AUC or R2 score (depending on whether the application was a classification or regression problem) across the entire model suite.

In order to evaluate the performance of each word embedding, I trained four end-to-end models for each embedding contained in the four model pipelines, and I scored these four models to allow me to conduct cross-model extrinsic evaluations. After completing the extrinsic evaluations to select the optimal final configuration, I partnered with Certilytics' internal clinical expert to conduct an intrinsic evaluation of the model using a custom clustering challenge on hand-selected medical codes which would be naturally expected to exhibit clusters or distance between similar and dissimilar codes respectively.

The final embedding sits centrally in most model pipelines within Certilytics model suite.


AIR Communities Projects
Future Lease Projection Application


I was tasked with building a program to create projection scenarios for future leases across AIR Communities' apartment property portfolio. The projections informed the budget and forecasting process for the organization, and I was initially approached to own this project after a single property projection (of ~100 owned) built in Excel was unable to handle the complete unit-level output and had a runtime of close to an hour.

I built the logic into a Python program which output results to a new SQL table available to the Decision Support team consuming the projections for forecasting. While the logic and calculations feeding the forecast are highly proprietary, I have included the video below which shows the GUI application I built on top of the program and bundled into an executable to enable Decision Support staff to independently rerun the program while tweaking model inputs. I used Tkinter to develop the GUI and PyInstaller to create the executable.

This project proved to be a disruptive innovation to the forecasting process at AIR Communities, expanding the forecast horizon and predictive capabilities of the financial future of the organization across a greater number of scenarios due to the quick, user-friendly deliverable.



CoStar Property Data Scraping


This project sought to collect over forty attributes for more than 850 competitor multi-family apartment home properties from the CoStar property research platform. The program achieved data collection, cleansing, and injection into storage in less than eight minutes start to finish. CoStar recently updated the service's Terms of Use to explicitly prohibit the web scraping techniques and reverse-engineering of the CoStar product utilized in this program. I ultimately led the project in an alternate direction to acquire similar data while keeping the business in compliance with CoStar's Terms of Use, and have shared the original program as proof of work.


View on GitHub CodeFactor Repo Grade

AutoML Demo with DataRobot


I created a tutorial and video demonstration of the automatic machine learning (AutoML) tool DataRobot. The tutorial provides a simple demonstration of DataRobot integration into a project applying sentiment analysis to daily chatbot message data to rank order prospect follow-up outreach conducted the following day. The final application can be viewed in the separate Prospect Ranked Follow-up Application repository.


View on GitHub CodeFactor Repo Grade

Personal Projects
NLP Miniature BERT Model Case Study


This project is a case study on developing NLP applications in a low-resource corporate environment operating a client-centric, service-based business model. I pretrained miniature BERT masked language models on domain-adapted vocabulary sourced from client-facing research documents. I demonstrated light improvements in model performance over baseline when finetuned to categorize client consultation requests by topic.


View on GitHub CodeFactor Repo Grade

Recording and Transcription Web Scraping Toolkit


I developed this toolkit to automate the collection of video recordings, recording metadata, and transcripts from a variety of different video conference, video hosting, and transcription service platforms. I personally utilized the tools during my four years working in client relationship management remotely supporting a territory containing hundreds of clients.


View on GitHub CodeFactor Repo Grade

Biweekly Sales Reports Automation


As the lucky husband to the founder of The Beverly Collective, a Colorado-based art collective, I built this program to reduce the manual workload of sending out biweekly sales reports emails to the 30+ artists and makers vending through the collective. I completed coding for this program in less than 5 hours and reduced the hourly workload from 10 hours per month to only 2 hours focused on email validation, payment processing, and vendor support each month. I successfully leveraged the Gmail API to gather user permissions and create email drafts within the user email and consumed Excel files into the Python-based program using the OpenPyxl library.


View on GitHub CodeFactor Repo Grade

Continuous Glucose Monitor Modeling


This project is designed to extend personal diabetes data and insights into the realm of real-time streaming, IoT integrations, and data science predictive modeling techniques. The project is launched from a foundation of diabetes data democratization facilitated by Nightscout, an open-source cloud application used by people with diabetes, providers, and caretakers to visualize, store and share the data from their Continuous Glucose Monitoring sensors in real-time.

Having recently established sensor data accessibility via a web-hosted MongoDB database, I am actively pursuing two aims with this project:

  1. Extend the data availability onto IoT technologies visualizing current blood glucose levels and directional trends.
  2. Apply cutting edge ML/AI modeling techniques to train novel predictive algorithms and compare performance to current industry standards.


View on GitHub CodeFactor Repo Grade

PDGA Prediction Modeling Utility Scripts


I co-authored a blog series hosted on Ultiworld Disc Golf predicting disc golf player performance at elite series events. I contributed player performance web scraping and GIS data collection capabilities, cleaned and preprocessed data, and edited post content. The scripts hosted in this repository demonstrate some of the larger data collection efforts feeding parts of the model. This was my first time ever using Python, and I am in the process of revisiting the files to spruce up the content. The blog posts are available on the Ultiworld Disc Golf website.


View on GitHub CodeFactor Repo Grade