Data Scientist with Healthcare domain expertise specializing in ML and AI.
MS in Data Science, Artificial Intelligence specialization, Northwestern University '22.
View My LinkedIn Profile
View My GitHub Profile
Impact: Developed and productionalized custom transformer architectures that achieved high-precision denial predictions, streamlining claims processing and improving financial efficiency.
To address the operational challenge of health insurance claim denials, I designed custom transformer-based neural network architectures. Unlike off-the-shelf solutions, these models were specifically tailored to capture the intricate relationships within complex claims data, resulting in high-performance metrics for both binary (denial vs. approval) and multi-class (denial reason) predictions.
I led the end-to-end deployment of these models into production for daily batch inference. The implementation utilized:
Impact: Productionalized custom transformer architectures that achieved high-precision denial predictions, directly optimizing financial recovery and claims processing efficiency.
To tackle the financial strain caused by unpredictable health insurance claim denials, I designed custom transformer-based neural network architectures tailored to the non-linear complexities of medical billing data. While standard models often struggle with the multi-dimensional relationships in claims, these specific architectures were engineered to identify potential denials and their underlying reasons before they impacted the bottom line.
I led the end-to-end deployment of these models for daily batch inference, ensuring a scalable and reliable production environment. The implementation featured:
Impact: Optimized the core medical data representation layer for the entire model suite, driving a 5% average performance lift across all downstream classification and regression tasks.
Following the acquisition of new patient-level medical utilization data, I led the retraining and optimization of the foundational word embeddings that power the Certilytics ecosystem. The objective was to identify the optimal hyperparameter configurations to represent complex medical sequences more effectively than baseline models.
To ensure the new embeddings were both statistically superior and clinically sound, I implemented a rigorous multi-stage validation process:
Impact: Replaced a slow, single-property Excel process with a high-performance Python application, enabling multi-scenario portfolio forecasting and reducing calculation time from hours to seconds.
Situation: The existing Excel-based projection tool was unable to handle unit-level data for the full ~100 property portfolio, with a runtime of nearly an hour per property.
Task: I was approached to engineer a scalable solution to create budget and forecasting projection scenarios across the entire portfolio independently.
Action:
Impact: Engineered a high-velocity data acquisition pipeline to collect 34,000+ data points across 850+ competitor properties, reducing market research cycles from weeks to under eight minutes.
To facilitate deep competitive analysis across a ~100 property portfolio, I developed an automated scraping and ETL toolkit designed to extract 40+ attributes for over 850 multi-family apartment homes. The objective was to replace a fragmented manual research process with a unified, high-speed program capable of cleansing and injecting data directly into internal storage.
The resulting architecture achieved end-to-end execution—from raw collection to validated injection—in less than eight minutes. Key technical highlights include:
Impact: Automated the lead prioritization workflow by integrating DataRobot AutoML, transforming unstructured chatbot data into a ranked follow-up queue that prioritizes high-intent prospects.
To showcase the efficiency of automated machine learning in sales operations, I built a proof-of-concept and technical tutorial utilizing DataRobot. The project addresses the operational challenge of high-volume chatbot interactions by automatically scoring sentiment and intent, allowing outreach teams to bypass manual lead triaging and focus on the most promising conversions.
The implementation and demonstration highlighted:
This project is a case study on developing NLP applications in a low-resource corporate environment operating a client-centric, service-based business model. I pretrained miniature BERT masked language models on domain-adapted vocabulary sourced from client-facing research documents. I demonstrated light improvements in model performance over baseline when finetuned to categorize client consultation requests by topic.
I developed this toolkit to automate the collection of video recordings, recording metadata, and transcripts from a variety of different video conference, video hosting, and transcription service platforms. I personally utilized the tools during my four years working in client relationship management remotely supporting a territory containing hundreds of clients.
As the lucky husband to the founder of The Beverly Collective, a Colorado-based art collective, I built this program to reduce the manual workload of sending out biweekly sales reports emails to the 30+ artists and makers vending through the collective. I completed coding for this program in less than 5 hours and reduced the hourly workload from 10 hours per month to only 2 hours focused on email validation, payment processing, and vendor support each month. I successfully leveraged the Gmail API to gather user permissions and create email drafts within the user email and consumed Excel files into the Python-based program using the OpenPyxl library.
This project is designed to extend personal diabetes data and insights into the realm of real-time streaming, IoT integrations, and data science predictive modeling techniques. The project is launched from a foundation of diabetes data democratization facilitated by Nightscout, an open-source cloud application used by people with diabetes, providers, and caretakers to visualize, store and share the data from their Continuous Glucose Monitoring sensors in real-time.
Having recently established sensor data accessibility via a web-hosted MongoDB database, I am actively pursuing two aims with this project:
I co-authored a blog series hosted on Ultiworld Disc Golf predicting disc golf player performance at elite series events. I contributed player performance web scraping and GIS data collection capabilities, cleaned and preprocessed data, and edited post content. The scripts hosted in this repository demonstrate some of the larger data collection efforts feeding parts of the model. This was my first time ever using Python, and I am in the process of revisiting the files to spruce up the content.
The blog posts are available on the Ultiworld Disc Golf website.