<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Herman Wandabwa on Medium]]></title>
        <description><![CDATA[Stories by Herman Wandabwa on Medium]]></description>
        <link>https://medium.com/@hermanwandabwa?source=rss-58e995b6d0e3------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/2*NtahlS4LHDjwlEGqWqFLXw.jpeg</url>
            <title>Stories by Herman Wandabwa on Medium</title>
            <link>https://medium.com/@hermanwandabwa?source=rss-58e995b6d0e3------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Mon, 11 May 2026 09:03:41 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@hermanwandabwa/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Predicting Daily Patient Volume for a Melbourne Urgent Care Clinic]]></title>
            <link>https://medium.com/data-science-collective/predicting-daily-patient-volume-for-a-melbourne-urgent-care-clinic-d4ca30007991?source=rss-58e995b6d0e3------2</link>
            <guid isPermaLink="false">https://medium.com/p/d4ca30007991</guid>
            <category><![CDATA[artificial-intelligence]]></category>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[time-series-forecasting]]></category>
            <category><![CDATA[healthcare]]></category>
            <category><![CDATA[machine-learning]]></category>
            <dc:creator><![CDATA[Herman Wandabwa]]></dc:creator>
            <pubDate>Sun, 10 May 2026 21:35:11 GMT</pubDate>
            <atom:updated>2026-05-10T21:35:11.115Z</atom:updated>
            <content:encoded><![CDATA[<h4><strong>How I built an XGBoost forecasting pipeline to turn patient demand into clinician rostering decisions</strong></h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*jHP1e7eUombb6q3O--jmBA.png" /><figcaption><em>Image generated by the author</em></figcaption></figure><p><strong>Author’s note:</strong> <em>This article uses independently created code and synthetic data for illustration. The methods described are standard forecasting and machine learning techniques commonly used across many application areas.</em></p><p><em>Stuck behind a firewall. Read the article for free </em><a href="https://hermanwandabwa.medium.com/d4ca30007991?source=friends_link&amp;sk=2b459f524ae72fe9eea37a9a96ea971a"><strong><em>HERE</em></strong></a></p><p>Running a walk-in urgent care clinic comes with a recurring operational headache of figuring out how many clinicians to roster on any given day. When they are too few, then chances are that the waiting room backs up and patients drift to the next nearest Emergency Department (ED). On the contrary, then you’re paying for doctor hours that you don’t need and vice versa. Most clinics base their patient numbers on a mix of last year&#39;s numbers, gut feel, and a bit of guesswork, especially around flu season. That’s the problem I wanted to tackle in this article, where I turn urgent care demand into something that can actually be forecasted, day-by-day, with a confidence interval that a rostering manager can plan against.</p><p>Forecasting urgent care demand is not the same as forecasting foot traffic at a shopping center. For example, if a mall misses the mark by 50 customers on a Tuesday, there is a high chance that nothing much will happen. Maybe the food court will just be overstaffed or the budget takes a small hit. However, in urgent care, the cost of being wrong is heavier. Understaffing means longer waits for people who may be genuinely unwell. On the contrary, overstaffing creates the opposite problem where it burns money that thin-margin clinics might not have. So this isn’t just a neat forecasting exercise but an operational problem with real consequences.</p><p>The clinic I’m building this around is a walk-in urgent care in Melbourne, and the city, as always, is a good fit for the exercise, and hey, I live here too. Melbourne’s health demand drivers are unusually rich, from Victorian public holidays (where EDs typically receive higher numbers as GPs will not be at work) to the Southern Hemisphere winter flu season. The city also experiences the October–November hay fever window and odd freak events like the <a href="https://knowledge.aidr.org.au/resources/storm-thunderstorm-asthma-victoria/">2016 thunderstorm-asthma</a> that overwhelmed emergency services across the city.</p><p>I’ve split the article into two parts. Part 1 (this one) <em>covers synthetic data generation, feature engineering, model training,</em> and <em>evaluation</em>. Part 2 will cover deployment with a FastAPI backend serving predictions over REST, a React + Tailwind dashboard for the rostering team, and a Supabase layer for logging predictions against real outcomes. As always, all code is open-sourced <a href="https://github.com/wandabwa2004/urgent_care_forecast/tree/main/notebooks">HERE</a>. Clone it and adapt for your own clinic or venue.</p><h4><strong>1. The Problem and Why Prediction Matters</strong></h4><p>Walk-in urgent care clinics sit awkwardly between general practice and emergency departments. They handle the complaints that can’t wait for a three-day GP appointment but don’t quite need an ambulance, i.e., sprains, cuts, chest infections, children’s fevers, minor fractures, post-injury follow-ups, etc. This makes their demand erratic. For example, on a typical Tuesday they might see 90 patients, whereas numbers can hit 200 on Easter Monday, half of them being people whose regular GP was closed for the holiday.</p><p>A properly thought-through predictive model quantifies this upfront. It should be able to output a number, a confidence interval, and a staffing tier that the rostering manager can actually rely and act on. With such a model in place, the question shifts from “<em>How many patients will walk through the door?”</em> to “<em>Do we need the Low, Medium, or High clinician roster today?”</em> Let&#39;s walk through what I built and how I did it.</p><h4><strong>2. Dataset Design &amp; Simulation</strong></h4><p>I didn’t have access to real clinic records (and wouldn’t publish them if I did). So I generated a synthetic dataset that reflects realistic urgent care patterns in Melbourne. This is useful not just because of privacy but because it lets you control the signal-to-noise ratio and demonstrate the methodology cleanly before the approach gets applied to messy real-world data.</p><p>The dataset spans January 2023 to December 2025, giving 1,096 daily records.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*MozAmkfQmRp3pYNssMCqHw.png" /><figcaption>Daily Patient Volume over Time</figcaption></figure><p>The light line shows raw daily patient counts, while the red line shows the 30-day moving average. The smoothed line makes the synthetic demand pattern easier to read: flu-driven winter peaks, a softer spring rise from hay fever, quieter summer months, and occasional spikes from thunderstorm asthma events or heatwaves.</p><p>Each record captures the following:</p><ul><li><strong>Temporal features</strong>: day of week, month, quarter, week of year</li><li><strong>Weather</strong>: temperature (Melbourne-specific ranges), precipitation (mm), relative humidity, weather type (Sunny / Partly Cloudy / Cloudy / Rainy)</li><li><strong>Calendar events</strong>: Victorian public holidays, Victorian school holidays, and crucially, the <em>day after</em> a public holiday</li><li>The feature set also captures key local health and weather drivers: winter flu season with a July peak, hay fever season from October to November, rare thunderstorm-asthma events, and temperature-extreme flags for days above 35°C or below 5°C.</li></ul><p>It&#39;s worth pointing out that several of these drivers have the <em>opposite effect </em>of what they’d be in other contexts. For example, rain might <em>reduce </em>visits to entertainment joints but <em>slightly increases</em> urgent care demand (slips, falls, and people who can’t put off the chest infection any longer). I spent some good time on these sign choices because getting them wrong would produce a plausible-looking dataset that models can be trained on, but the learned relationships would be backwards. This is exactly how a seemingly validated model ends up giving bad rostering advice in production.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*3D5pV3nAUGiQftC6GtGXLw.png" /><figcaption>Weather and Illness Driver Impact on Patient Volume</figcaption></figure><p><em>Left: mean daily patients by weather type. Rainy days run slightly above sunny days. Middle: patients vs. temperature. Right: pollen index vs. patients, flat until index hits ~7, then climbs sharply (the hayfever-to-asthma pathway).</em></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*43BTzblksSneo769lSoE5A.png" /><figcaption>Temporal Patterns in Patient Volume</figcaption></figure><p>Above are four views of the same temporal story.</p><p>The plots suggest a strong calendar effect. Mondays are the busiest day, sitting about 15% above midweek levels. July and August are the peak months, which lines up with winter pressure, and Q3 has the highest quarterly demand. In the heatmap, Monday in July is the busiest day-month combination.</p><p>The patient count itself is generated using <a href="https://www.sciencedirect.com/topics/computer-science/multiplicative-factor">multiplicative factors</a> applied to a base of 100 patients/day:</p><pre># Day of week effect (Monday is peak due to weekend GP backlog)<br>day_multiplier = {0: 1.15, 1: 1.00, 2: 0.95, 3: 0.95, 4: 1.00, 5: 0.85, 6: 0.75}<br>df[&#39;patients&#39;] *= df[&#39;day_of_week&#39;].map(day_multiplier)<br># Seasonal effect (Southern Hemisphere - winter = flu season)<br>season_multiplier = {<br>1: 0.85, 2: 0.90, 3: 0.95, 4: 1.00, 5: 1.10, 6: 1.30,<br>7: 1.45, 8: 1.35, 9: 1.15, 10: 1.10, 11: 1.05, 12: 0.90<br>}<br>df[&#39;patients&#39;] *= df[&#39;month&#39;].map(season_multiplier)<br># Calendar effects - NOTE: public holidays INCREASE demand here (GPs closed)<br>df.loc[df[&#39;is_public_holiday&#39;] == 1, &#39;patients&#39;] *= 1.40<br>df.loc[df[&#39;is_day_after_public_holiday&#39;] == 1, &#39;patients&#39;] *= 1.22<br>df.loc[df[&#39;is_school_holiday&#39;] == 1, &#39;patients&#39;] *= 1.15<br># Epidemiological drivers<br>df.loc[df[&#39;is_flu_peak&#39;] == 1, &#39;patients&#39;] *= 1.50<br>df.loc[df[&#39;is_hayfever_season&#39;] == 1, &#39;patients&#39;] *= 1.05<br>df.loc[df[&#39;is_thunderstorm_asthma&#39;] == 1, &#39;patients&#39;] *= 2.30<br># Weather impact (different sign from the museum case)<br>df.loc[df[&#39;weather_type&#39;] == &#39;Rainy&#39;, &#39;weather_factor&#39;] = 1.08<br>df.loc[df[&#39;temp_extreme_hot&#39;] == 1, &#39;weather_factor&#39;] *= 1.25<br>df.loc[df[&#39;temp_extreme_cold&#39;] == 1, &#39;weather_factor&#39;] *= 1.20<br>df[&#39;patients&#39;] *= df[&#39;weather_factor&#39;]</pre><p>Each factor has a measurable and verifiable uplift in the final data as quantified in the chart below.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ZE_m3Ahp7CokNJIgcCzRBg.png" /><figcaption>Impact of Key Factors on Patient Volume</figcaption></figure><p>Each panel compares mean patient count with and without the factor active. Flu peak (+54%), public holidays (+40%), and day-after-public-holiday (+22%) dominate. Hayfever season has a small but consistent lift (+5%). By the way, not every hay fever day leads to asthma, but across the full two-month window, the effect stacks up.</p><p>To make the synthetic demand less perfectly behaved, I added Gaussian noise with a standard deviation equal to 12% of expected daily traffic, a 2% outlier rate where traffic spikes to 1.5–2.5× normal levels, and a small daily growth trend of about 0.01% to reflect a slowly expanding catchment population. The final values are clipped between 30 and 500 patients/day, which is a realistic operating range for a single walk-in clinic. Remember, this is simulated data and should NOT be used as a replacement for real clinic records.</p><p>The resulting distribution has the right-skewed shape typical of event-driven demand. Most days cluster around 100–150 patients, with a long tail of flu-season weekdays and public-holiday spikes that push above 250.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*wgrRzFnXgqZIvF8pnSxxVg.png" /><figcaption>Right-skewed distribution of raw patient counts</figcaption></figure><p>The histogram shows skewness ≈ 1.14, with the mean sitting above the median and a visible right tail of high-volume days.</p><h4><strong>3. Feature Engineering</strong></h4><p>I’ve made the same point in previous articles and will keep making it: raw data gets you baseline predictions while feature-engineered data gets you good ones. Starting from 31 raw columns, I engineered 87 features grouped into eight categories:</p><pre>Feature groups:<br>  Temporal    :  17<br>  Calendar    :   3<br>  Season      :   4<br>  Weather     :  11<br>  Epidemio    :   5<br>  Interaction :   6<br>  Lag         :  14<br>  Rolling     :  27<br>  Total       :  87</pre><p>Most of the techniques here are around cyclical encoding, lag features, and rolling stats, so I’ll briefly touch on those and linger on the ones that are specific to the urgent care domain.</p><p><strong>a) Cyclical Encoding — <em>Why Day 0 and Day 6 Should Be “Close”</em></strong></p><p>Here’s the subtle problem with temporal features again. If you feed <em>day_of_week</em> directly into a model as values 0 to 6, then the model will treat Monday (0) and Sunday (6) as being far apart as much as they’re adjacent. The same applies to months. December (12) and January (1) are close and not opposites.</p><p>Cyclical encoding using <em>sin/cos </em>transforms is the best workaround for this:</p><pre>df[&#39;dow_sin&#39;] = np.sin(2 * np.pi * df[&#39;day_of_week&#39;]/7)<br>df[&#39;dow_cos&#39;] = np.cos(2 * np.pi * df[&#39;day_of_week&#39;]/7)<br>df[&#39;month_sin&#39;] = np.sin(2 * np.pi * (df[&#39;month&#39;]-1)/12)<br>df[&#39;month_cos&#39;] = np.cos(2 * np.pi * (df[&#39;month&#39;]-1)/12)<br>df[&#39;doy&#39;] = df[&#39;date&#39;].dt.dayofyear<br>df[&#39;doy_sin&#39;] = np.sin(2 * np.pi * df[&#39;doy&#39;]/365)<br>df[&#39;doy_cos&#39;] = np.cos(2 * np.pi * df[&#39;doy&#39;]/365)</pre><p>The best way to think around this is to assume that you’re placing each day on the face of a clock. Monday and Sunday end up right next to each other, and December sits beside January. Both <em>sin </em>and <em>cos </em>components are needed because <em>sin</em> alone can’t distinguish between two days that map to the same y-coordinate (e.g., Tuesday and Friday might have the same <em>sin </em>value but different <em>cos </em>values).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*voK-_49j1FPp-wfD6JP05Q.png" /><figcaption>Cyclical Encoding</figcaption></figure><p>On the left, days of the week are mapped onto a circle via <em>sin/cos</em>. On the right, months on the same circular encoding. Both components are needed to uniquely identify each position on the circle.</p><p>I applied this to<em> day-of-week, month, day-of-year, </em>and <em>week-of-year</em>.</p><p><strong>b) Lag Features—&quot;Yesterday<em> Predicts Tomorrow”</em></strong></p><p>Recent history turned out to be one of the strongest predictive signals in this dataset. If 150 patients walked in yesterday, today’s number is more likely to sit near 150 than near 80. I replicated this in a way that intervals correspond to real rhythms: short-term momentum, weekly cycle, fortnightly, monthly, and seasonal.</p><pre>for lag in [1, 2, 3, 7, 14, 21, 28, 60, 90]:<br>    df[f&#39;patients_lag_{lag}&#39;] = df[&#39;patients&#39;].shift(lag)<br><br># Same-weekday average over the last 4 occurrences<br>df[&#39;mean_last_4_same_dow&#39;] = df[[&#39;lag_7_same_dow&#39;,&#39;lag_14_same_dow&#39;,<br>                                 &#39;lag_21_same_dow&#39;,&#39;lag_28_same_dow&#39;]].mean(axis=1)</pre><p>Why these specific intervals, if you may ask? Lags 1–3 capture short-term momentum. Lag 7 captures the same-day-last-week pattern (which matters enormously here, since Mondays and Sundays behave very differently). Lags 14, 21, and 28 capture fortnightly and monthly rhythms. Lags 60 and 90 capture seasonal trends. The same-weekday average is particularly powerful in urgent care because it answers the clinically grounded question, &quot;<em>How many patients typically walk in on a Monday?&quot;</em> which is a very different question from “<em>How many typically walk in on a Sunday?&quot;</em> Mondays in this dataset run ~50% above Sundays.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*83U14uyBdmEJD_LjM-EsIg.png" /><figcaption>Autocorrelation Function</figcaption></figure><p>The ACF shows significant autocorrelation at multiples of 7 days (the weekly cycle) and a slower decay that persists out past 60 days (the seasonal structure). The PACF highlights lags 1 and 7 as carrying the most independent predictive information, exactly the lags prioritised in the feature set.</p><p><strong>c) Rolling Window Statistics</strong></p><p>Beyond point lags, I computed rolling means, standard deviations, max, and min across multiple windows, plus exponentially weighted moving averages:</p><pre>for window in [3, 7, 14, 30, 60, 90]:<br>    df[f&#39;rolling_mean_{window}d&#39;] = df[&#39;patients&#39;].shift(1).rolling(window).mean()<br>    df[f&#39;rolling_std_{window}d&#39;]  = df[&#39;patients&#39;].shift(1).rolling(window).std()<br>    df[f&#39;rolling_max_{window}d&#39;]  = df[&#39;patients&#39;].shift(1).rolling(window).max()<br>    df[f&#39;rolling_min_{window}d&#39;]  = df[&#39;patients&#39;].shift(1).rolling(window).min()<br><br>#  Exponentially Weighted Moving Average(EWMA) — more weight on recent days<br>for span in [7, 14, 30]:<br>    df[f&#39;ewma_{span}d&#39;] = df[&#39;patients&#39;].shift(1).ewm(span=span, adjust=False).mean()</pre><p>The shift(1) is critical as it prevents data leakage by ensuring that we only use information available <em>before</em> the prediction date. If you get this wrong, then you’ll have leaked future data into training, and test metrics will look amazing until the model hits production and falls apart. The EWMA variants give extra weight to recent observations, which is useful because patterns can shift gradually (think of the ramp-up at the start of flu season).</p><p><strong>d) Interaction Features</strong></p><p>In most cases, urgent care demand is driven by <em>combinations </em>and not individual factors. A <em>rainy Monday</em> during <em>flu peak</em> is a fundamentally different day from <em>a sunny Tuesday</em> in <em>February</em>, even if you break down each feature independently. I captured the most clinically plausible combinations as follows, and of course this can be extended to your liking:</p><pre>df[&#39;ph_x_monday&#39;]=df[&#39;is_public_holiday&#39;]*df[&#39;is_monday&#39;]<br>df[&#39;flu_peak_x_rainy&#39;]= df[&#39;is_flu_peak&#39;]*(df[&#39;weather_type&#39;]==&#39;Rainy&#39;).astype(int)<br>df[&#39;hayfever_x_high_pollen&#39;]= df[&#39;is_hayfever_season&#39;]*(df[&#39;pollen_index&#39;]&gt;=7).astype(int)<br>df[&#39;school_holiday_x_weekend&#39;]= df[&#39;is_school_holiday&#39;]*df[&#39;is_weekend&#39;]<br>df[&#39;extreme_cold_x_flu_season&#39;]= df[&#39;temp_extreme_cold&#39;]*df[&#39;is_flu_season&#39;]<br>df[&#39;day_after_ph_x_monday&#39;]= df[&#39;is_day_after_public_holiday&#39;]*df[&#39;is_monday&#39;]<br><br># Count of concurrent illness drivers<br>df[&#39;illness_driver_count&#39;]=(df[&#39;is_flu_season&#39;]+df[&#39;is_hayfever_season&#39;]<br>                              +df[&#39;is_day_after_public_holiday&#39;]<br>                              +df[&#39;temp_extreme_hot&#39;]+df[&#39;temp_extreme_cold&#39;])</pre><p>The illness_driver_count feature captures a simple idea, which is <em>how many illness-related factors are active on a given day</em>. For example, a day with three active drivers is fundamentally different from a day with zero, and the stacking is nearly monotonic.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/889/1*LIn6kbaMUfU1De5muuLpmw.png" /><figcaption>Average Patients by Illness Driver Count</figcaption></figure><p>The plot above shows a clear stacking effect. Days with no illness drivers average about 111 patients across 605 days, while those with one driver average about 153 patients across 475 days. Days with two concurrent drivers are rare, appearing only 16 times in the three-year window, but they average about 175 patients.</p><p><strong>e) Melbourne Seasons</strong></p><p>This matters because Melbourne’s seasonal pattern is flipped relative to Northern Hemisphere cities. December is summer, not winter, and if the model gets that wrong, its seasonal assumptions will definitelypoint in the wrong direction.</p><pre>def get_season(month):<br>    if month in [12, 1, 2]: return &#39;Summer&#39;<br>    if month in [3, 4, 5]:  return &#39;Autumn&#39;<br>    if month in [6, 7, 8]:  return &#39;Winter&#39;<br>    return &#39;Spring&#39;</pre><p>In urgent care, seasonal alignment matters because Melbourne’s winter months, June to August, map directly to flu season. If the model encodes seasonality incorrectly, then the flu-related demand signals get diluted across the wrong part of the calendar.</p><p><strong>f) Log-Transform of the Target</strong></p><p>Patient counts as in the original data are right-skewed (skewness ≈ 1.14), and log1pbrings the distribution much closer to normal, which helps gradient-based models converge and stops them from overweighting outlier days at the expense of typical ones:</p><pre>df[&#39;patients_log&#39;] = np.log1p(df[&#39;patients&#39;])</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*vtdFIPWQqwoKLSydkwgZMw.png" /><figcaption>Log Transformation on Patient Counts</figcaption></figure><p>The left plot shows the raw target distribution, with a skewness of 1.14 and a clear right tail from high-volume days. After applying log1p, skewness drops to 0.30, and the distribution becomes much closer to normal. The models train on this transformed target, then predictions are converted back to patient counts using np.expm1() before being returned. I’ll revisit this in Part 2 when discussing the production API.</p><p>With all 87 features engineered, the next step was to see which ones actually carried signal against the target.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/889/1*MGuOKmDLP-cBklZzhlWTpQ.png" /><figcaption>Top 30 Features by Correlation with Patient Count</figcaption></figure><p>The rolling 7/14-day averages cluster at the top (correlations between 0.61 and about 0.69), followed by the ‘<em>rolling_mean7d</em>’.</p><p>The longest lag feature looked back 90 days, so the first 90 rows had to be removed after feature engineering. That left about 1,006 usable observations, enough for model training and a separate 90-day holdout set.</p><h4><strong>4. Model Training &amp; Comparison</strong></h4><p>With the features ready, I trained and compared four modeling approaches. Before that, I needed to get the split strategy right.</p><p><strong>Train/Test Split — Respecting Time</strong></p><p>With time series data, random train/test splits do not really work well. If you shuffle the dates, the model can learn from future patterns while being evaluated on earlier ones. That is data leakage, and it makes the metrics look better than they really are.</p><p>I kept the split chronological, where the final 90 days became the holdout test set, and everything before that was used for training.</p><pre>Train: 915 rows (2023–04–01 → 2025–10–02)<br>Test : 90 rows (2025–10–03 → 2025–12–31)</pre><p>For cross-validation, I used TimeSeriesSplit with five expanding-window folds. Each fold trained only on past data and validated on future data, keeping the evaluation aligned with how the model would be used in practice.</p><p><strong>The Models</strong></p><p>Before reaching for more sophisticated algorithms, I started with naive baselines. These are simple prediction strategies that rely on arithmetic rather than learned patterns. If a more complex model cannot beat them, then it&#39;s not earning its complexity.</p><p>I used three naive baselines, each slightly more informed than the last.</p><ul><li><strong>Global mean: This</strong> predicts every day as the training set average, which was about 130 patients. It ignores everything, including day of week, weather, flu season, and public holidays. It is the “shrug” prediction, but it anchors the lower end of the performance range.</li><li><strong>Last week’s count (lag 7)</strong>: This predicts today’s patient volume using the count from the same day last week. It captures a weekly rhythm, but nothing else. That means it cannot adapt to flu peaks, weather changes, school holidays, or public holidays.</li><li><strong>7-day moving average</strong>: This predicts today’s volume as the average of the previous seven days. It is smoother than the lag-7 baseline, but still blind to why demand is changing. It follows recent drift, but it does not understand the drivers behind it.</li></ul><p>The shared limitation of all three is that they use a single signal and can’t combine information. <em>A sunny Saturday during school holidays with a thunderstorm-asthma alert</em> is nothing like a <em>rainy Tuesday in summer</em>, and none of these baselines can tell the difference. That’s the gap ML fills.</p><p>I used three modeling algorithms for this problem.</p><ol><li><strong>Random Forest</strong> used 300 decision trees, with each tree trained on a random slice of the data and a random subset of features using max_features=&#39;sqrt&#39;. Individual trees can learn rules such as <em>&quot;If it is Monday, flu activity is high, and temperature is below 10°C, expect higher patient volume.” </em>The final prediction averages the output from all 300 trees, which smooths out the quirks of any single tree. I also set min_samples_leaf=5 to stop the trees from memorising small pockets of noise. Like the other tree-based models, Random Forest was trained on the log-transformed target.</li><li><strong>XGBoost </strong>works differently. Instead of growing trees independently, it builds them in sequence. Each new tree focuses on the errors left behind by the previous trees. In simple terms, Random Forest averages many independent trees, while XGBoost builds trees one after another, with each tree chasing the remaining residuals. I used 500 trees and a learning rate of 0.05, meaning each tree only made a small correction. That usually requires more trees, but it produces a more stable model. I trained XGBoost on log1p(patients) so that high-volume days did not dominate the loss function.</li><li><strong>Prophet </strong>(Facebook’s time series model) takes a different angle. Instead of relying on the hand-built features, it decomposes the time series into trend, seasonality, and holiday effects, then fits those components directly. I used multiplicative seasonality so the seasonal swings could scale with the baseline level of demand. I also added external regressors for public holidays, flu peak, pollen index, and weather.</li></ol><p>Here is the XGBoost training block.</p><pre>xgb_model = xgb.XGBRegressor(<br>    n_estimators=500,<br>    learning_rate=0.05,<br>    max_depth=6,<br>    subsample=0.8,<br>    colsample_bytree=0.8,<br>    min_child_weight=5,<br>    reg_alpha=0.1,<br>    reg_lambda=1.0,<br>    random_state=42,<br>    n_jobs=-1,<br>    verbosity=0,<br>)<br># Train on log-transformed target<br>xgb_model.fit(X_train, y_train_log)<br># Predictions need inverse transform<br>xgb_preds = np.expm1(xgb_model.predict(X_test))</pre><p>The code for the other two approaches can be found <a href="https://github.com/wandabwa2004/urgent_care_forecast/blob/main/notebooks/04_modeling.ipynb">HERE</a>.</p><h4><strong>Results</strong></h4><p>XGBoost produced the best results across all metrics. Its MAPE of 11.89% means that, on an average day of about 130 patients, the forecast error is roughly 15 patients. The point is not that the model predicts every day exactly, but that it provides a reliable enough signal to support staffing decisions.</p><pre>| Model                | MAE   | RMSE  | R²    | MAPE   |<br>|----------------------|-------|-------|-------|--------|<br>| XGBoost              | 17.04 | 27.69 |  0.30 | 11.89% |<br>| Prophet              | 17.98 | 28.98 |  0.23 | 12.84% |<br>| Random Forest        | 18.34 | 29.35 |  0.21 | 12.91% |<br>| Baseline (7d MA)     | 22.93 | 32.36 |  0.04 | 17.18% |<br>| Baseline (mean)      | 24.04 | 33.04 | -0.00 | 18.06% |<br>| Baseline (last week) | 24.09 | 35.87 | -0.18 | 17.22% |</pre><p>The improvement over baselines is substantial. The best baseline (7-day moving average) has an MAE of 22.93 patients, while XGBoost achieves 17.04, a 26% reduction in error. MAPE drops from 17.2% to 11.9%, where this gap is the difference between useful predictions and educated guesses.</p><p>Two things worth calling out on these numbers:</p><ul><li>The negative<strong> R² </strong>for the last-week baseline is not a bug<strong>.</strong> It just means that baseline actually performs <em>worse</em> than predicting the constant mean on this test set. Baselines aren’t guaranteed to beat constant-mean, and that’s part of why they’re useful as they calibrate expectations.</li><li>An R² of about 0.30 looks modest, but that is partly a signal-to-noise issue. Daily urgent care volumes have a lot of short-term variation that calendar, weather, and epidemiological features cannot fully explain. MAPE is more useful here because it measures the typical percentage error in the forecast. At 11.9%, it gives a clearer view of whether the model is useful for planning staff.</li></ul><p>On the test data, actuals vs predicted numbers are as below:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*oOoSo7HNtY8gbj-poGsMvQ.png" /><figcaption>Actual vs Predicted Patient Counts</figcaption></figure><p>XGBoost tracks the observed series most closely, capturing both the weekly cycle and the late-year flu ramp. Random Forest follows the overall shape but underestimates peak demand. Prophet captures the broad seasonal movement but smooths over short-term variation because it relies more on decomposed trend and seasonality than on the engineered feature set used by the tree-based models.</p><h4><strong>5. Model Evaluation &amp; SHAP Explainability</strong></h4><p>Raw metrics tell you <em>how well</em> the model performs while evaluation tells you <em>where it struggles and why</em>, and that’s the part that actually matters when you’re deciding whether to trust it.</p><p><strong>Residual Analysis</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*TtB7qRfWZRnXfUSnAq5eRg.png" /><figcaption>Residual Analysis</figcaption></figure><p>Overall, the residuals look sensible. They are roughly centred around zero, with no obvious time-based drift across the holdout period. The predicted-versus-actual plot tracks the diagonal reasonably well, although the model still under-predicts some of the highest-volume days. Most percentage errors fall within about ±20%, with a slight negative skew.</p><p>The model has a small negative bias of about 6.5 patients per day, meaning it tends to under-predict demand slightly, which is fundamental in rostering. Understaffing is riskier than modest overstaffing, so the production pipeline should not treat the point forecast as the final staffing number. A safer approach is to roster closer to the upper bound of the 80% prediction interval.</p><p><strong>SHapley Additive exPlanations (SHAP) Explainability</strong></p><p>I used SHAP to understand what was driving the model’s predictions. This mattered for two reasons. First, it helped check whether the model was relying on clinically sensible signals rather than spurious patterns. Second, it gave me a concrete way to explain the forecasts to clinical directors and rostering managers when they ask why the model expects demand to rise or fall.</p><pre>explainer = shap.TreeExplainer(model)<br>shap_values = explainer.shap_values(X_shap)<br>shap.summary_plot(shap_values, X_shap, plot_type=&#39;dot&#39;, max_display=20)</pre><p>The top features line up with what a clinician would expect:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/939/1*otHwKyAlyFsdTSP1Ffhahw.png" /><figcaption>SHAP Values</figcaption></figure><p>The figure above is a SHAP beeswarm plot, showing how each feature pushes predictions up or down across the holdout set. Each dot represents one observation for that feature, coloured by the feature value, with red indicating higher values and blue indicating lower values. is_flu_peak, illness_driver_count, and patients_lag_7 dominate the top of the ranking, suggesting the model relies heavily on epidemiological pressure and recent demand. Lag and rolling-window features also contribute meaningfully, but they sit further down the ranking.</p><p>I also generated a waterfall plot for the single worst prediction day. This is the kind of plot you want ready when a clinical director asks, <em>“Why did the model get yesterday so wrong?”</em> It breaks the forecast into feature-level contributions, showing which signals pushed the prediction up or down and by how much.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/802/1*owkvx9i-ENDfAAhiesDQ5w.png" /><figcaption>Single prediction explanation for the worst prediction day</figcaption></figure><p>That matters because not every bad prediction has the same cause. Sometimes the model misses because something happened outside the feature set, such as a nearby GP closure, a road accident, or another local disruption. Other times, the model may be facing a rare combination of feature values it has not seen often enough during training. The waterfall plot helps separate those two cases.</p><p><strong>Prediction Intervals</strong></p><p>A point prediction alone isn’t enough. Rostering managers need to know the <em>range</em> of uncertainty, which is the reason I computed bootstrap prediction intervals using training residuals:</p><pre>np.random.seed(42)<br>n_bootstrap = 500<br>base_pred = test[&#39;pred&#39;].values<br>bootstrap_preds = np.array([<br>    base_pred + np.random.choice(cv_resid, size=len(base_pred), replace=True)<br>    for _ in range(n_bootstrap)<br>])<br>lower_80 = np.percentile(bootstrap_preds, 10, axis=0)<br>upper_80 = np.percentile(bootstrap_preds, 90, axis=0)<br>lower_95 = np.percentile(bootstrap_preds, 2.5, axis=0)<br>upper_95 = np.percentile(bootstrap_preds, 97.5, axis=0)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*P7vZNI8sXbI5bcjCRy5Y1w.png" /><figcaption>Predictions with Bootstrap Intervals</figcaption></figure><p>The red line shows the model’s prediction, while the black line shows the actual patient count. The darker band represents the 80% prediction interval, and the lighter band represents the 95% interval. The actual values fall within these bands at roughly the expected rates, which suggests the uncertainty estimates are reasonably well calibrated.</p><p>For a rostering manager, this is more useful than a single number. Instead of saying, <em>“We predict 145 patients tomorrow,”</em> the model can say,<em> “We predict 145 patients tomorrow, with an 80% prediction interval of 113 to 177.”</em> That gives the manager a planning range rather than a false sense of precision.</p><p>There is one caveat, though. This bootstrap method assumes the residuals are drawn from the same error distribution across all conditions, which is unlikely to be true. Public holidays, flu peaks, and unusual demand days probably have wider errors than quiet weekdays. A more refined version would estimate condition-specific intervals, but for a first deployment, this gives a useful and honest uncertainty range.</p><h4><strong>6. Stakeholders Engagement</strong></h4><p>From experience, MAE and RMSE do not mean much to clinical directors or rostering managers on their own. They are useful modeling metrics, but they do not directly answer the operational question:<em> How many people should we roster tomorrow?</em></p><p>So I framed the model output in two more practical ways.</p><p><strong>Staffing Tier Classification</strong></p><p>Instead of only returning a raw patient-count forecast, the model can also classify each day into a staffing tier that maps directly to roster planning. The exact thresholds should be defined by the clinical operations team, but a simple example could look like this:</p><pre><strong>Low </strong>(&lt;80 patients): 2 doctors, 3 nurses, standard roster <br><strong>Medium </strong>(80–150): 3 doctors, 4 nurses, extra triage nurse <br><strong>High </strong>(&gt;150): 4+ doctors, surge protocol, ED overflow coordination</pre><p>This makes the forecast easier to act on. Each tier has a concrete staffing implication, a cost implication, and a clinical risk implication. The rostering manager no longer has to interpret patient-count deltas in isolation. They can use the model output as a planning signal tied directly to staffing decisions.</p><p><strong>Operational Impact</strong></p><p>The real value is not just lower MAE. It is better roster decisions. If the model gets the demand tier right, the clinic avoids wasting money on unnecessary staff during quiet days and reduces the risk of understaffing during demand spikes.</p><p>That is the version a clinical director can use. Instead of saying, <em>“The model has an MAE of 17 patients,”</em> the better message is, &quot;We<em> can identify the correct staffing tier about 8 out of 10 days, and we can give you an 80% prediction interval before the roster is locked in.”</em></p><p>That’s Part 1 done, and I’m sorry to admit that I ran it long. I promise (just like I’ve done in the past), Part 2 will be way shorter and more intuitive. We’ve gone from breaking down the business problem through data generation, feature engineering, model training, evaluation, and stakeholder framing. The XGBoost model with 87 engineered features hits a MAPE of 11.9% and produces well-calibrated prediction intervals that can be used for rostering decisions. I’ve basically provided for you a framework that you can plug your code and data into and forecast results.</p><p>In Part 2, I’ll take this trained model and deploy it as a full-stack application: a <em>FastAPI</em> backend serving predictions through a <em>REST API</em>, a <em>React + Tailwind</em> dashboard for stakeholders, and possibly a Supabase database for logging predictions against actuals so we can backfill real outcomes and monitor drift. I’ll also cover deployment considerations and what I’d do differently with real clinic data. If you’d like an article on optimising predicted numbers to actual staffing levels, then let me know.</p><p>As always, all code is open-sourced <a href="https://github.com/wandabwa2004/urgent_care_forecast/tree/main/notebooks">HERE</a>. Feel free to clone it and adapt it for your own use case. If you found this useful, please clap, leave a comment, or share it with someone working on forecasting or operational AI. You can always find my other articles on my <a href="https://medium.com/@hermanwandabwa">profile</a>, and I’m always happy to connect via <a href="https://www.linkedin.com/in/wandabwaherman/">LinkedIn</a>.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=d4ca30007991" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-science-collective/predicting-daily-patient-volume-for-a-melbourne-urgent-care-clinic-d4ca30007991">Predicting Daily Patient Volume for a Melbourne Urgent Care Clinic</a> was originally published in <a href="https://medium.com/data-science-collective">Data Science Collective</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[From Theory to Code — Building a SIM Replacement Agent in LangGraph (Part 2)]]></title>
            <link>https://medium.com/data-science-collective/part-2-from-theory-to-code-building-a-sim-swap-agent-in-langgraph-7290f5699023?source=rss-58e995b6d0e3------2</link>
            <guid isPermaLink="false">https://medium.com/p/7290f5699023</guid>
            <category><![CDATA[generative-ai-use-cases]]></category>
            <category><![CDATA[ai-agent]]></category>
            <category><![CDATA[llm-applications]]></category>
            <category><![CDATA[agentic-ai]]></category>
            <category><![CDATA[machine-learning]]></category>
            <dc:creator><![CDATA[Herman Wandabwa]]></dc:creator>
            <pubDate>Mon, 27 Apr 2026 21:35:40 GMT</pubDate>
            <atom:updated>2026-05-07T09:56:05.053Z</atom:updated>
            <content:encoded><![CDATA[<h4>How State, Tools, Nodes, and Edges Power an AI Agent</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ouRPz9aNqoR9P80gvSizRQ.png" /><figcaption>Image generated by the author</figcaption></figure><p><em>Stuck behind a firewall? Read the article for FREE </em><a href="https://hermanwandabwa.medium.com/7290f5699023?source=friends_link&amp;sk=69220bfdbde4ff83250de1435b40f495"><strong><em>here</em></strong></a><em>.</em></p><p>In <a href="https://medium.com/data-science-collective/what-agentic-ai-actually-is-a-sim-replacement-use-case-part-1-b9f1672d68b5">Part 1</a>, I covered what agentic AI actually means and why a SIM card replacement workflow is a good way to test whether an application is genuinely agentic. I introduced the four building blocks that every agent should ideally have:<strong><em> state, tools, nodes,</em></strong> and <strong><em>edges</em></strong>. If you haven’t read the <a href="https://medium.com/data-science-collective/what-agentic-ai-actually-is-a-sim-replacement-use-case-part-1-b9f1672d68b5">first one</a> yet, I’d suggest starting there because this article builds directly on it.</p><p>The goal in this part is to translate the theoretical concepts in Part 1 to a practical and working agentic prototype. I’ll stick to <a href="https://www.langchain.com/langgraph">LangGraph </a>, an orchestration framework to build, manage and deploy stateful AI agents. The good thing with LangGraph is that it allows for better design and customisation of sub-agents and related artefacts. My hope is that you’ll have a good and practical grasp of agentic AI through a working agentic application that can run the full SIM swap flow for a telcom company. This ranges from identity verification all the way to customer notification about the final swap. The representative flowchart, just like what is in Part 1 is below:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*fwsawTk6wxb2w9oa.png" /><figcaption>SIM card replacement process</figcaption></figure><p>As always, all the code is open-source and can be found <a href="https://github.com/wandabwa2004/safaricom_agentic_ai">here</a>, so you can clone and run it yourself.</p><p>So let’s get into the interesting bits : -</p><h4><strong>1. The State Object</strong></h4><p>Remember the notebook analogy from <a href="https://medium.com/data-science-collective/what-agentic-ai-actually-is-a-sim-replacement-use-case-part-1-b9f1672d68b5">Part 1</a>? <strong><em>State </em></strong>is what the agent carries from step to step which is everything it knows at any given moment. In LangGraph, state can be defined as a Python TypedDict.</p><p>The choice of TypedDictover a class is deliberate as it serialises cleanly, it’s straightforward to checkpoint and restore, and LangGraph can easily track incremental state changes without any custom logic. Here’s the full state for the SIM swap agent:</p><pre>from typing import Annotated, TypedDict<br>from langgraph.graph.message import add_messages<br><br>class SimSwapState(TypedDict):<br>    # Subscriber context<br>    subscriber_id: str<br>    subscriber_name: str<br>    msisdn: str<br>    alternate_msisdn: str<br>    id_number: str<br><br>    # Request context<br>    request_id: str<br>    request_reason: str  # &quot;lost&quot; | &quot;stolen&quot; | &quot;damaged&quot; | &quot;upgrade&quot;<br>    request_timestamp: str<br>    channel: str    # &quot;shop&quot; | &quot;call_centre&quot; | &quot;ussd&quot;<br><br>    # Identity verification - Knowledge-Based Authentication(KBA)<br>    id_verification_status: str  # &quot;pending&quot; | &quot;passed&quot; | &quot;failed&quot;<br>    id_verification_attempts: int<br>    id_max_retries: int<br>    kba_questions_correct: int<br><br>    # Line status checks<br>    line_status: str  # &quot;active&quot; | &quot;suspended&quot; | &quot;barred&quot; | &quot;churned&quot; | &quot;not_found&quot;<br>    fraud_flag: bool<br>    pending_swap: bool<br>    line_check_passed: bool<br><br>    # New SIM capture<br>    new_iccid: str<br>    new_imsi: str<br>    sim_validation_status: str # &quot;pending&quot; | &quot;valid&quot; | &quot;invalid&quot;<br>    sim_validation_attempts: int<br>    sim_max_retries: int<br>    sim_validation_errors: list[str]<br><br>    # Current (to-be-blocked) SIM<br>    old_iccid: str<br>    old_imsi: str<br>    old_sim_blocked: bool<br><br>    # Provisioning<br>    new_sim_provisioned: bool<br><br>    # M-PESA<br>    mpesa_registered: bool<br>    mpesa_balance: float<br>    mpesa_reactivated: bool<br>    mpesa_temporary_pin_sent: bool<br><br>    # Network<br>    network_activated: bool<br><br>    # Notifications<br>    confirmation_sms_sent: bool<br><br>    # Process metadata<br>    current_step: str<br>    audit_log: list[dict]<br>    error_messages: list[str]<br>    process_status: str  # &quot;in_progress&quot; | &quot;completed&quot; | &quot;rejected&quot; | &quot;halted&quot;<br><br>    # Conversation<br>    messages: Annotated[list, add_messages]</pre><p>That looks like a lot of fields, but if you read it in groups like in the above code, then it will be easy to make sense of. The fields in the <em>subscriber context </em>are the details that a customer would need to have before they are served. Their MSISDN (phone number), ID number, and an alternate number to send the final confirmation SMS to etc. Thereafter, the <em>request context</em> has fields that capture the reason for the interaction with the system.</p><p>From there, every block maps directly to a stage in the workflow. KBA tracking holds the attempt counts, while the line status checks whether fraud or a pending swap was found. The SIM capture block tracks validation retries with pipeline fields (blocked, provisioned, M-PESA, network) flipping from <em>False</em> to <em>True</em> as each step completes. The metadata block at the very end gives the audit log, error accumulation, and an overall process status.</p><p>Most state fields follow a simple last-write-wins rule. If a node returns a new value for fraud_flag, the old one is overwritten. The exception field is messages, which uses Annotated[list, add_messages] to tell LangGraph to merge new messages into the existing list rather than replace it. That&#39;s how the agent accumulates a full conversation history as it moves through the steps.</p><p>One thing worth noting is that the graph itself is <em>stateless </em>as it has no memory between runs. Basically, the state object carries everything, including every business rule in the workflow, from retry limits to fraud flags to M-PESA reactivation checks.</p><h4><strong>2. Tools</strong></h4><p>If the state object is the notebook, the tools are what the agent uses to fill it in. In this implementation, each tool gets its own service classand they are all defined <a href="https://github.com/wandabwa2004/safaricom_agentic_ai/tree/dev/tools">here</a>:</p><pre>|<strong> Tool                          | What it simulates </strong><br>|-------------------------------|---------------------------------------------|<br>| IdentityVerificationService   | KBA challenge: ID number, frequently called numbers, recent M-PESA transaction, last top-up amount <br>| LineStatusService             | HLR lookup: line status (active/suspended/barred) + fraud flag + pending swap check <br>| SimValidationService          | Validates ICCID format (19–20 digits, `89254…` Kenya prefix) and IMSI uniqueness <br>| SimManagementService          | Blocks the old SIM, provisions the new one (binds IMSI to MSISDN in the HLR) <br>| MpesaService                  | Restores the M-PESA wallet on the new SIM and resets the PIN <br>| NetworkActivationService      | Activates voice, SMS, data, and USSD on the new SIM <br>| NotificationService           | Sends the confirmation SMS to the alternate number </pre><p>I mocked all seven tools in this example. They use configurable success rates and a short asyncio.sleep to simulate real-world latency. Here&#39;s what IdentityVerificationService looks like:</p><pre>class IdentityVerificationService:<br>    def __init__(self, success_rate: float | None = None):<br>        self.success_rate = (<br>            success_rate if success_rate is not None<br>            else config.IDENTITY_VERIFICATION_SUCCESS_RATE<br>        )<br><br>    async def verify(self, subscriber_id: str) -&gt; dict:<br>        await asyncio.sleep(random.uniform(0.6, 1.4))<br><br>        questions = [&quot;id_number&quot;, &quot;frequently_called&quot;, &quot;recent_mpesa&quot;, &quot;last_topup&quot;]<br>        per_question: dict[str, bool] = {<br>            q: random.random() &lt; self.success_rate for q in questions<br>        }<br>        correct = sum(per_question.values())<br>        passed = correct &gt;= config.KBA_QUESTIONS_REQUIRED<br><br>        return {<br>            &quot;verification_id&quot;: str(uuid.uuid4()),<br>            &quot;status&quot;: &quot;verified&quot; if passed else &quot;failed&quot;,<br>            &quot;questions_total&quot;: config.KBA_QUESTIONS_TOTAL,<br>            &quot;questions_required&quot;: config.KBA_QUESTIONS_REQUIRED,<br>            &quot;questions_correct&quot;: correct,<br>            &quot;per_question&quot;: per_question,<br>            ...<br>        }</pre><p>The KBA challenge asks the customer four independent questions. They need to get at least three right to pass, which is configurable in <a href="https://github.com/wandabwa2004/safaricom_agentic_ai/blob/dev/config.py">config.py</a>:</p><pre>KBA_QUESTIONS_TOTAL = 4<br>KBA_QUESTIONS_REQUIRED = 3<br>MAX_IDENTITY_RETRIES = 3<br>MAX_SIM_SERIAL_RETRIES = 3</pre><p>You might be wondering why I used mock APIs. The answer is that what is being built and tested here is the agent’s decision-making logic, including the retries, routing, state transitions, and fraud handling. That logic is completely separate from any actual Safaricom API. Mocking the services isolates the agent logic so I can run dozens of deterministic scenarios without needing a live network, and the mock classes can be swapped for real HTTP clients whenever the system is ready for production.</p><h4><strong>3. The Nodes</strong></h4><p><strong><em>Nodes </em></strong>are the individual steps of the workflow, and in LangGraph they follow a simple contract where each node is an async function that takes the full state as input and returns a dictionary of just the fields it changed. LangGraph merges the partial update back into the state automatically.</p><p>Here’s the first node, initiate_request which fires when a customer starts the process:</p><pre>async def initiate_request(state: dict) -&gt; dict:<br>    request_id = f&quot;SWP-{uuid.uuid4().hex[:8].upper()}&quot;<br>    now = datetime.now().isoformat()<br><br>    msg = (<br>        f&quot;Karibu Safaricom. I&#39;m here to help you replace your SIM for a &quot;<br>        f&quot;**{state.get(&#39;request_reason&#39;, &#39;lost&#39;)}** line.\n\n&quot;<br>        f&quot;Your request reference is **{request_id}**. &quot;<br>        f&quot;I&#39;ll take you through the swap step by step.\n\n&quot;<br>        f&quot;**Step 1:** Let&#39;s first verify your identity.&quot;<br>    )<br><br>    audit_log = _log(state, &quot;request_initiated&quot;, {<br>        &quot;request_id&quot;: request_id,<br>        &quot;request_reason&quot;: state.get(&quot;request_reason&quot;, &quot;lost&quot;),<br>        &quot;subscriber_id&quot;: state.get(&quot;subscriber_id&quot;, &quot;&quot;),<br>        &quot;channel&quot;: state.get(&quot;channel&quot;, &quot;shop&quot;),<br>    })<br><br>    return {<br>        &quot;request_id&quot;: request_id,<br>        &quot;request_timestamp&quot;: now,<br>        &quot;current_step&quot;: &quot;initiate_request&quot;,<br>        &quot;process_status&quot;: &quot;in_progress&quot;,<br>        &quot;id_verification_status&quot;: &quot;pending&quot;,<br>        &quot;id_verification_attempts&quot;: 0,<br>        &quot;id_max_retries&quot;: config.MAX_IDENTITY_RETRIES,<br>        &quot;sim_validation_status&quot;: &quot;pending&quot;,<br>        &quot;sim_validation_attempts&quot;: 0,<br>        &quot;sim_max_retries&quot;: config.MAX_SIM_SERIAL_RETRIES,<br>        &quot;old_sim_blocked&quot;: False,<br>        &quot;new_sim_provisioned&quot;: False,<br>        ...<br>        &quot;audit_log&quot;: audit_log,<br>        &quot;messages&quot;: [AIMessage(content=msg)],<br>    }</pre><p>This node does three things. It <em>generates a unique request reference</em>, <em>sets every counter and flag to its initial value,</em> and <em>appends the opening message</em>. That last point is intentional as all initialisation lives here and not scattered across downstream nodes.</p><p>Now let’s look at verify_identity, because this is the first node where things can actually go wrong:</p><pre>async def verify_identity(state: dict) -&gt; dict:<br>    attempts = state.get(&quot;id_verification_attempts&quot;, 0) + 1<br>    result = await identity_service.verify(state[&quot;subscriber_id&quot;])<br><br>    correct = result[&quot;questions_correct&quot;]<br>    required = result[&quot;questions_required&quot;]<br>    total = result[&quot;questions_total&quot;]<br><br>    if result[&quot;status&quot;] == &quot;verified&quot;:<br>        msg = (<br>            f&quot;Identity verified — you answered **{correct}/{total}** questions &quot;<br>            f&quot;correctly (minimum {required} required). Thank you!\n\n&quot;<br>            f&quot;**Step 2:** I&#39;ll now check the status of your line.&quot;<br>        )<br>        status = &quot;passed&quot;<br>    else:<br>        remaining = state.get(&quot;id_max_retries&quot;, config.MAX_IDENTITY_RETRIES) - attempts<br>        reasons = &quot;; &quot;.join(result.get(&quot;failure_reasons&quot;, []))<br>        if remaining &gt; 0:<br>            msg = (<br>                f&quot;I got **{correct}/{total}** correct — I need at least **{required}**. &quot;<br>                f&quot;Specifically: {reasons}.\n\n&quot;<br>                f&quot;Let&#39;s try again. You have **{remaining}** attempt(s) remaining.&quot;<br>            )<br>        else:<br>            msg = (<br>                f&quot;Identity verification failed after &quot;<br>                f&quot;{state.get(&#39;id_max_retries&#39;, config.MAX_IDENTITY_RETRIES)} attempts. &quot;<br>                f&quot;For your security, I cannot proceed with the SIM swap.&quot;<br>            )<br>        status = &quot;failed&quot;<br><br>    return {<br>        &quot;current_step&quot;: &quot;verify_identity&quot;,<br>        &quot;id_verification_status&quot;: status,<br>        &quot;id_verification_attempts&quot;: attempts,<br>        &quot;kba_questions_correct&quot;: correct,<br>        &quot;audit_log&quot;: _log(state, &quot;identity_verification&quot;, {...}),<br>        &quot;messages&quot;: [AIMessage(content=msg)],<br>    }</pre><p>The node increments id_verification_attempts itself, and produces a contextual message depending on the outcome. If retries remain, the customer is told how many are left. If they are exhausted, the node delivers the rejection message. Most importantly though, the node does not decide what happens next. It simply sets id_verification_status to &quot;passed&quot; or &quot;failed&quot; and returns. The decision about whether to retry or reject lives elsewhere.</p><p>The other nodes follow the same pattern. check_line_statusqueries the Home Location Register (HLR) and sets <em>fraud_flag</em>, <em>pending_swap</em>, and <em>line_check_passed</em>. capture_new_sim validates the new SIM’s ICCID and IMSI. Thereafter, the core pipeline runs sequentially: block_old_sim, provision_new_sim, reactivate_mpesa, activate_on_network, and finally send_confirmation_sms, which assembles the full swap summary table for the customer.</p><p>There’s also a shared log helper used by every node:</p><pre>def _log(state: dict, action: str, details: dict) -&gt; list[dict]:<br>    entry = {<br>        &quot;timestamp&quot;: datetime.now().isoformat(),<br>        &quot;step&quot;: state.get(&quot;current_step&quot;, &quot;unknown&quot;),<br>        &quot;action&quot;: action,<br>        &quot;request_id&quot;: state.get(&quot;request_id&quot;, &quot;&quot;),<br>        **details,<br>    }<br>    return state.get(&quot;audit_log&quot;, []) + [entry]</pre><p>It appends one structured entry per node execution and returns the full updated list. Because audit_log is just another field in the state, it travels with the swap from start to finish, building up a complete record of every step without needing any external logging system.</p><h4><strong>4. Edges — Where the Decisions Actually Live</strong></h4><p>This is the most important component of the agentic framework. If nodes are the checklist items then edges are the logic between them. In LangGraph, an edge is a function that reads the current state and returns the name of the next node to run.</p><p>There are two kinds. <em>Linear edges</em> connect nodes that always run in sequence, no matter what. The core swap pipeline is entirely linear once the gate checks pass:</p><pre>graph.add_edge(&quot;block_old_sim&quot;, &quot;provision_new_sim&quot;)<br>graph.add_edge(&quot;provision_new_sim&quot;, &quot;reactivate_mpesa&quot;)<br>graph.add_edge(&quot;reactivate_mpesa&quot;, &quot;activate_on_network&quot;)<br>graph.add_edge(&quot;activate_on_network&quot;, &quot;send_confirmation_sms&quot;)</pre><p>The linear logic in the above code is that you cannot provision before blocking and activate before provisioning. The order is non-negotiable and the linear edges encode that directly.</p><p>On the other hand,<strong> </strong><em>Conditional edges</em><strong> </strong>are where the agentic behaviour actually comes from and they are three of them. Let’s look at the one after identity verification:</p><pre>def route_after_identity(state: SimSwapState) -&gt; str:<br>    if state.get(&quot;id_verification_status&quot;) == &quot;passed&quot;:<br>        return &quot;check_line_status&quot;<br>    if state.get(&quot;id_verification_attempts&quot;, 0) &gt;= state.get(<br>        &quot;id_max_retries&quot;, config.MAX_IDENTITY_RETRIES<br>    ):<br>        return &quot;reject_request&quot;<br>    return &quot;verify_identity&quot;</pre><p>Three possible outcomes are here. If KBA passed, move on to the line check. If it failed but retries remain, return verify_identityof the same node that just ran. If attempts are exhausted, send to reject_request. That last return value, routing a node back to itself, is the retry loop. A chatbot doesn’t have this. It can’t decide to revisit a step based on what just happened. This function is five lines and it’s actually the difference between a pipeline and an agent.</p><p>The line check router is deliberately simpler:</p><pre>def route_after_line_check(state: SimSwapState) -&gt; str:<br>    if state.get(&quot;line_check_passed&quot;):<br>        return &quot;capture_new_sim&quot;<br>    return &quot;reject_request&quot;</pre><p>It has two outcomes and no retry. Fraud flags, suspended lines, and pending swaps all terminate immediately because none of those are recoverable through self-service. There’s no loop here by design as the edge just encodes the business rule.</p><p>The SIM validation router mirrors the identity one, with pass, retry up to the limit, or reject as the three possible outcomes:</p><pre>def route_after_sim_validation(state: SimSwapState) -&gt; str:<br>    if state.get(&quot;sim_validation_status&quot;) == &quot;valid&quot;:<br>        return &quot;block_old_sim&quot;<br>    if state.get(&quot;sim_validation_attempts&quot;, 0) &gt;= state.get(<br>        &quot;sim_max_retries&quot;, config.MAX_SIM_SERIAL_RETRIES<br>    ):<br>        return &quot;reject_request&quot;<br>    return &quot;capture_new_sim&quot;</pre><p>The value of keeping routing logic in edges rather than nodes is so that you can unit test each router as a pure function of state, pass it a dict with the relevant fields and <em>assert </em>on the returned string. You don’t need to wire up the full graph, call any services, or manage <em>async</em>. In addition, if a business rule changes, say the fraud policy becomes a soft warning rather than a hard stop, then you update one function in <a href="https://github.com/wandabwa2004/safaricom_agentic_ai/blob/dev/graph/edges.py">edges.py</a> and nothing else in the system needs to change.</p><h4><strong>Putting It All Together</strong></h4><p>The graph assembly lives in <a href="https://github.com/wandabwa2004/safaricom_agentic_ai/blob/dev/graph/builder.py">builder.py</a>. It is a straightforward sequence that follows this path: <em>create the graph, register each node</em>, <em>connect them with edges,</em> and <em>compile</em>:</p><pre>from langgraph.graph import StateGraph, END<br><br>def build_graph():<br>    graph = StateGraph(SimSwapState)<br><br>    # Register nodes<br>    graph.add_node(&quot;initiate_request&quot;, initiate_request)<br>    graph.add_node(&quot;verify_identity&quot;, verify_identity)<br>    graph.add_node(&quot;check_line_status&quot;, check_line_status)<br>    graph.add_node(&quot;capture_new_sim&quot;, capture_new_sim)<br>    graph.add_node(&quot;block_old_sim&quot;, block_old_sim)<br>    graph.add_node(&quot;provision_new_sim&quot;, provision_new_sim)<br>    graph.add_node(&quot;reactivate_mpesa&quot;, reactivate_mpesa)<br>    graph.add_node(&quot;activate_on_network&quot;, activate_on_network)<br>    graph.add_node(&quot;send_confirmation_sms&quot;, send_confirmation_sms)<br>    graph.add_node(&quot;reject_request&quot;, reject_request)<br><br>    # Entry point<br>    graph.set_entry_point(&quot;initiate_request&quot;)<br><br>    # Linear edge: entry flows into first gate<br>    graph.add_edge(&quot;initiate_request&quot;, &quot;verify_identity&quot;)<br><br>    # Conditional routing at the three decision points<br>    graph.add_conditional_edges(&quot;verify_identity&quot;, route_after_identity, {<br>        &quot;check_line_status&quot;: &quot;check_line_status&quot;,<br>        &quot;reject_request&quot;: &quot;reject_request&quot;,<br>        &quot;verify_identity&quot;: &quot;verify_identity&quot;,   # the retry self-loop<br>    })<br>    graph.add_conditional_edges(&quot;check_line_status&quot;, route_after_line_check, {<br>        &quot;capture_new_sim&quot;: &quot;capture_new_sim&quot;,<br>        &quot;reject_request&quot;: &quot;reject_request&quot;,<br>    })<br>    graph.add_conditional_edges(&quot;capture_new_sim&quot;, route_after_sim_validation, {<br>        &quot;block_old_sim&quot;: &quot;block_old_sim&quot;,<br>        &quot;capture_new_sim&quot;: &quot;capture_new_sim&quot;,   # retry self-loop<br>        &quot;reject_request&quot;: &quot;reject_request&quot;,<br>    })<br><br>    # Linear pipeline: core swap<br>    graph.add_edge(&quot;block_old_sim&quot;, &quot;provision_new_sim&quot;)<br>    graph.add_edge(&quot;provision_new_sim&quot;, &quot;reactivate_mpesa&quot;)<br>    graph.add_edge(&quot;reactivate_mpesa&quot;, &quot;activate_on_network&quot;)<br>    graph.add_edge(&quot;activate_on_network&quot;, &quot;send_confirmation_sms&quot;)<br><br>    # Terminal nodes<br>    graph.add_edge(&quot;send_confirmation_sms&quot;, END)<br>    graph.add_edge(&quot;reject_request&quot;, END)<br><br>    return graph.compile()</pre><p>One thing I like about LangGraph is that the compiled graph can export itself as a Mermaid diagram for a graphical view as follows:</p><pre>def get_graph_mermaid() -&gt; str:<br>    compiled = build_graph()<br>    return compiled.get_graph().draw_mermaid()</pre><p>The diagram is generated from the same code that runs in production, which means it can never go stale. Change a routing rule and the diagram reflects it on the next export. Anyone who has ever maintained a system architecture diagram in a slide deck, only to watch it become outdated within a week, will appreciate why that matters.</p><h4><strong>Five Scenarios, Five Paths Through the Same Graph</strong></h4><p>I added a <a href="https://github.com/wandabwa2004/safaricom_agentic_ai/tree/dev/demo">demo </a>that you can <a href="https://github.com/wandabwa2004/safaricom_agentic_ai/blob/dev/README.md">setup </a>locally and run. The demo comes with five built-in scenarios, each using a different subscriber from the seed data. You run them like this:</p><pre>python -m demo.cli happy_path<br>python -m demo.cli identity_retry<br>python -m demo.cli fraud_flag<br>python -m demo.cli bad_iccid<br>python -m demo.cli full_failure</pre><p>Here’s what each one exercises:</p><pre>| Scenario       | Subscriber                     | What actually happens |<br>|----------------|--------------------------------|---------------------------|<br>| happy_path     | Wanjiku Kamau (+254712345678)  | All gates pass on the first attempt, full pipeline runs, swap completes |<br>| identity_retry | —                              | KBA fails once, the self-loop fires, passes on the second attempt |<br>| fraud_flag     | Njoroge Mwangi (+254745678901) | Line check finds a fraud flag, `route_after_line_check` returns `reject_request` immediately |<br>| bad_iccid      | —                              | SIM validation fails on the first attempt, retries, eventually rejects |<br>| full_failure   | —                              | Identity verification exhausted after three attempts, process rejected |</pre><p>The important thing to observe is that all five paths run through the same compiled graph. The agent doesn’t have a separate code path for fraud versus normal flow. The fraud flag is a field in the state that is read by the edge, and is followed by routing. That’s all there is to it.</p><p>Below is an example of a happy path that I ran. It produces an output that looks roughly like this:</p><pre>Agent: Karibu Safaricom. I&#39;m here to help you replace your SIM for a lost line.<br><br>Your request reference is SWP-3F8A21DC. I&#39;ll take you through the swap step by step.<br><br>Step 1: Let&#39;s first verify your identity.<br><br>Agent: Identity verified — you answered 3/4 questions correctly (minimum 3 required). Thank you!<br><br>Step 2: I&#39;ll now check the status of your line.<br><br>Agent: Line status confirmed.<br>- MSISDN: +254712345678<br>- Name: Wanjiku Kamau<br>- Status: Active<br>- Current SIM serial: 89254…5678<br>- M-PESA: Registered (balance Ksh 4,320.00)<br><br>Step 3: Please hand me the new SIM so I can capture its serial (ICCID) and IMSI.<br><br>...<br><br>Agent: Your new SIM is now LIVE on the Safaricom network — voice, SMS, data and USSD are all enabled.<br><br>SIM Swap Summary<br>| Request Ref       | SWP-3F8A21DC          |<br>| MSISDN            | +254712345678         |<br>| Old SIM           | 89254…1234 (Blocked)  |<br>| New SIM           | 89254…5678 (Active)   |<br>| M-PESA            | Restored (Ksh 4,320)  |<br>| Confirmation SMS  | Delivered             |<br><br>AUDIT LOG (9 entries)<br>  [2026-04-25T10:12:03] request_initiated<br>  [2026-04-25T10:12:04] identity_verification<br>  [2026-04-25T10:12:05] line_status_check<br>  [2026-04-25T10:12:06] sim_validation<br>  [2026-04-25T10:12:07] block_old_sim<br>  [2026-04-25T10:12:08] provision_new_sim<br>  [2026-04-25T10:12:09] mpesa_reactivate<br>  [2026-04-25T10:12:10] network_activation<br>  [2026-04-25T10:12:11] confirmation_sms<br><br>Final status: completed</pre><p>The above shows nine audit entries, one per node, each with a timestamp and a request ID. The full state at any point in that run can be checkpointed and replayed. That’s not something you get out of any conventional chatbot.</p><h4><strong>What You Actually Have Now</strong></h4><p>If you run this locally, here’s what you’ve actually built:</p><ul><li><strong>The graph is the documentation: </strong>The Mermaid export is generated from live code. The retry loops, the fraud halt, and the M-PESA skip for unregistered lines are all visible in the diagram and come directly from the routing functions. They’re all visible in the diagram and they all come directly from the routing functions. Nobody has to keep a separate flowchart up to date.</li><li><strong>The state is the audit trail: </strong>Every step appends one entry to audit_log. By the time the process reaches its terminal node, you have a complete, timestamped record of exactly what the agent did and why. That record lives in the same object as the result, no separate logging infrastructure required for the prototype.</li><li><strong>The edges are the policy: </strong>If Safaricom changes the KBA requirement from three-out-of-four to four-out-of-four, that’s a one-line change in config.py. If fraud policy changes from immediate termination to a human-escalation path, you update route_after_line_check and add a new node. The rest of the graph is not really affected.</li></ul><h4><strong>What’s Still Missing</strong></h4><p>What I’ve tried put together here is a solid prototype. But there’s a meaningful gap between prototype and production, and I’ll try cover this gap in Part 3.</p><p>Remember I haven’t connected this to anything external yet. There’s a FastAPI wrapper that exposes the agent as an OpenAI-compatible endpoint so Open WebUI can talk to it, but I haven’t walked through that here. I’ve also not added observability. Right now, if something fails mid-run, you would need to dig through the state manually. There is also no human-in-the-loop interrupt, meaning an operator cannot pause the flow and review it before the old SIM gets blocked. For a production SIM swap system, that last one is not optional.</p><p>I promise to cover most of these in Part 3: the FastAPI service, hooking into Langfuse for tracing, and adding interrupt-based human review at the fraud flag step. The code will be extended in the same repo.</p><p>The full source code and Readme on how to run it is available <a href="https://github.com/wandabwa2004/safaricom_agentic_ai"><strong>HERE</strong></a>.</p><p>I hope this was useful. If you found it helpful, then please follow me for more of such, leave a comment, clap for me and of course let me know what else you’d like to see in Part 3 . You can also find all my other articles on my <a href="https://medium.com/@hermanwandabwa">profile </a>and connect with me on <a href="https://www.linkedin.com/in/wandabwaherman/">LinkedIn</a>. Remember all my articles are FREE to read.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=7290f5699023" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-science-collective/part-2-from-theory-to-code-building-a-sim-swap-agent-in-langgraph-7290f5699023">From Theory to Code — Building a SIM Replacement Agent in LangGraph (Part 2)</a> was originally published in <a href="https://medium.com/data-science-collective">Data Science Collective</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[What Agentic AI Actually Is: A SIM Replacement Use Case (Part 1)]]></title>
            <link>https://medium.com/data-science-collective/what-agentic-ai-actually-is-a-sim-replacement-use-case-part-1-b9f1672d68b5?source=rss-58e995b6d0e3------2</link>
            <guid isPermaLink="false">https://medium.com/p/b9f1672d68b5</guid>
            <category><![CDATA[agentic-ai]]></category>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[artificial-intelligence]]></category>
            <category><![CDATA[data-analytics]]></category>
            <category><![CDATA[large-language-models]]></category>
            <dc:creator><![CDATA[Herman Wandabwa]]></dc:creator>
            <pubDate>Mon, 20 Apr 2026 10:51:16 GMT</pubDate>
            <atom:updated>2026-04-20T10:52:53.664Z</atom:updated>
            <content:encoded><![CDATA[<h4>How a real, multi-step telco workflow exposes the architectural gap between chatbots and agents</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*47418695aOkrc-BzqnldBg.png" /><figcaption>Image generated by the author</figcaption></figure><p><strong>Stuck behind a firewall? Read the article for FREE </strong><a href="https://hermanwandabwa.medium.com/what-agentic-ai-actually-is-a-sim-replacement-use-case-part-1-b9f1672d68b5?source=friends_link&amp;sk=ba7b37ebf34944a94c9af14e2ced55ee"><strong>here</strong></a></p><p>It’s been quite some time since I started writing “seriously” about <a href="https://hermanwandabwa.medium.com/list/data-science-and-analytics-af1e961b7a0f">data analytics and science</a>, <a href="https://hermanwandabwa.medium.com/list/optimisation-13c618431392">optimisation </a>and, most recently, <a href="https://hermanwandabwa.medium.com/list/agentic-ai-and-llms-6d358df6e86c">generative AI</a>. In the Gen AI space, I’ve tried covering topics around RAG, finetuning, agentic AI, and a fair bit in between. If you’ve read any of my articles, I’m sure you know I lean towards not only a simple but also a practical angle rather than theoretical ones. I also share all links on my <a href="https://hermanwandabwa.medium.com/">profile,</a> and all are FREE to read.</p><p>Agentic AI is THE buzzword right now, and I’m happy to join the bandwagon with a series on it. After all, I’m just human 😂.</p><p>But here’s what actually pushed me to write this. I kept reading articles that promised to explain “agentic AI” and ended up describing a chatbot with a slightly longer prompt. Most of them are wrappers around an LLM (Large Language Model) with one or two API calls bolted on, dressed up as something brand new. Simply put, agentic AI is an AI system that can accomplish a specific goal with limited human supervision, exhibiting real autonomy, goal-driven behaviour, and adaptability rather than just operating within predefined constraints. I don’t buy the watered-down versions, and of course I’m NOT going to claim I’ve built something wildly novel either.</p><p>But I did build one that actually is agentic. An agent that runs the full SIM card replacement process for a mobile network on its own, from identity verification all the way to customer notification, with no human pushing it along at each step. I built it as a reference implementation around the <a href="https://www.safaricom.co.ke/">Safaricom</a> Kenya SIM swap flow just in case a customer needs it done.</p><p>This is Part 1 of a three-part series. Here I’ll cover what agentic AI actually means, why a SIM replacement workflow is a good example of it, and the four mental building blocks every agent needs. Part 2 turns everything here into working open-source Python code with LangGraph. Part 3 is about what it takes to ship one of these into production. So don’t expect any code in this part, just the concepts you’ll need to make sense of the rest of the articles.</p><h3>What Makes an AI “Agentic”?</h3><p>Let me clear something up, because this word gets abused a lot.</p><p>A standard chatbot is reactive. You say X, it says Y. It has no memory of what it did five messages ago. It cannot call external systems on your behalf. And critically, it cannot decide what to do next based on what just happened. Every turn is independent. Have you ever found yourself typing “talk to a human” or “speak to an agent” into a customer service chatbot just to get transferred to a real person? The bot did not figure out on its own that you were frustrated or that your issue was beyond it. You had to spell it out. That is exactly the limitation we are talking about.</p><p>An agentic AI is different. It decides which action to take next based on the current situation. It calls real tools like APIs, databases, and provisioning systems. It handles failures gracefully, retrying or escalating as needed. It maintains its own running state across many steps. Lastly, it works towards a goal without a human directing every move unless fully specified to be so in its design.</p><p>Think of the difference between a customer service rep reading from a script versus an experienced retail agent who knows the process, handles edge cases, and uses their own judgment at each step. That’s the gap we’re bridging. The <em>script-reader</em> is your chatbot, while the <em>experienced agent</em> is your agent.</p><p>The SIM replacement use case is a good testbed for this because it is genuinely multi-step, genuinely branching, and genuinely fails in interesting ways. Identity verification can fail several times before the system gives up. The new SIM serial may not match what the customer reads out. M-PESA reactivation can stall. A line flagged for fraud terminates the process entirely. A simple chatbot collapses at the first unexpected branch while a properly configured agent handles all of it.</p><h3>The Four Building Blocks of Every Agent</h3><p>Before we get into the specifics of the SIM replacement workflow, here’s the mental model that makes everything else click. Every agent, regardless of framework, comes down to four concepts.</p><pre>Concept   | Question it answers                    | Analogy<br>----------|----------------------------------------|--------------------------------<br>State     | What does the agent know right now?    | A notebook the agent carries<br>Tools     | What can the agent do?                 | The agent&#39;s hands<br>Nodes     | How does each step work?               | Individual tasks on a checklist<br>Edges     | What should the agent do next?         | The decision logic between tasks</pre><p>Let me walk through each one in plain language.</p><ol><li><strong>State</strong> is the agent’s running memory. It’s everything the agent knows at any moment, including who the customer is, what’s been verified, what’s failed, and what’s still pending. In the SIM replacement scenario, <em>state </em>holds the customer’s MSISDN (their phone number), their ID details, how many verification attempts have been used, the new SIM’s serial number, whether the old line has been blocked, whether the new SIM has been activated, whether M-PESA has been re-linked, and so on. Think of it as a notebook the agent carries from step to step. Every step reads from it and writes to it.</li><li><strong>Tools</strong> are what the agent can actually do in the real world. APIs to call, databases to query, provisioning systems to update, SMS gateways to message. In the SIM replacement example, the tools include an identity verification service, a line status check, the SIM provisioning system that blocks the old IMSI and binds the new one, an M-PESA reactivation service, and a notification service. Without tools, an agent is just talk. With tools, it can take action.</li><li><strong>Nodes</strong> are the individual steps of the workflow. Each node has one job. In this example, it could be to verify the customer’s identity, check the line status, block the old SIM, activate the new one, re-link M-PESA or even send the confirmation SMS. Each step does its work, usually by calling one or more tools, then updates the state. Keeping each node focused on a single job is what makes the whole system testable and debuggable later.</li><li><strong>Edges</strong> are the decision logic between steps. After a node finishes, an edge looks at the current state and decides where to go next. This is where the “agentic” behaviour actually lives. For example, if identity verification passes, then go to the line status check. If it failed but retries remain, loop back and try again, and if retries are exhausted, terminate the request. Edges are how an agent decides, dynamically, what to do next based on what just happened. A chatbot doesn’t have edges. It just responds.</li></ol><p>That’s the entire mental model.<strong> State, tools, nodes</strong>, and <strong>edges</strong>. Once you have these four ideas in your head, every agent you ever look at will make sense.</p><h3>The Problem We’re Actually Solving</h3><p>When a Safaricom customer loses their SIM card, the process at an M-PESA agent or retail shop looks roughly like this, and this is just one path that I’m keen to automate:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*z1GrMnWZR7XeMK6gDMT5Lg.png" /><figcaption>SIM card replacement process</figcaption></figure><p>Each step depends on the previous one. Some steps, like identity verification, can loop within a limit while some are terminal. For example, a fraud flag ends the process immediately while others adapt to circumstances. If the customer has forgotten their original PIN, the workflow takes a different verification path using deeper account history questions instead of failing outright. This is exactly the kind of workflow that exposes whether your “agent” is real and works or is simply a marketing gimmick.</p><p>It will be hard for a traditional chatbot to do this. It can answer “what happens when I lose my SIM” with a script. However, it cannot actually run the process, handle the branching, retry failed steps, or decide between standard verification and the deeper PIN-recovery path based on what the customer remembers. An agent can and that’s the whole point.</p><h3>Why This Matters Now</h3><p>The framing I keep coming back to is this: an agent is not a smarter chatbot. It’s a different architecture entirely. You stop thinking about responses and start thinking about processes. You stop thinking about prompts and start thinking about state. That shift in mental model is what separates the people building real agentic systems from the people slapping the label on a chatbot and hoping no one notices.</p><p>There’s a practical reason this matters now. The cost of LLM calls has dropped enough that running a multi-step workflow is no longer prohibitive. Frameworks like LangGraph have matured to the point where the plumbing is no longer the hard part. The constraint has shifted from “can we build this” to “do we understand what we’re building.” For organisations sitting on workflows that look like the SIM replacement process, and most telcos, utilities, and service providers have dozens of them, the door is open.</p><p>The interesting question is no longer whether agents work. It’s which workflows are worth turning into agents and what you need to know to build them properly.</p><h3>What Comes Next</h3><p>That’s Part 1, and I intended to keep it as theoretical as possible. I covered what agentic AI actually means, why a SIM replacement workflow is a good test of whether something is genuinely agentic, and the four mental building blocks (state, tools, nodes, edges) that every agent shares.</p><p>In Part 2, I’ll translate all of this into working Python code using LangGraph. We’ll build the actual state object, wire up the tools, write the nodes, and define the edges that handle the retry loops, the fraud flag termination, and the alternative verification paths. Everything in this article will be there in the code that will be open for anyone to customise.</p><p>In Part 3, I’ll cover deployment. This will largely be what it takes to move from a working prototype locally to a production system: observability, human-in-the-loop interrupts, the FastAPI service, etc.</p><p>I hope this walkthrough was useful. Don’t forget to follow me, clap for me, and leave a comment. Let me know if you’d like me to cover anything specific in Part 2 or Part 3, whether that’s a particular LangGraph pattern, a deployment detail, or something else entirely. If you want to check out my other articles, you can find them on my <a href="https://medium.com/@hermanwandabwa"><strong>profile</strong></a>. I’m also happy to connect via <a href="https://www.linkedin.com/in/wandabwaherman/"><strong>LinkedIn</strong></a>.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=b9f1672d68b5" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-science-collective/what-agentic-ai-actually-is-a-sim-replacement-use-case-part-1-b9f1672d68b5">What Agentic AI Actually Is: A SIM Replacement Use Case (Part 1)</a> was originally published in <a href="https://medium.com/data-science-collective">Data Science Collective</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[From Jupyter Notebook to Full-Stack ML App with FastAPI, React, and Supabase]]></title>
            <link>https://medium.com/data-science-collective/stop-guessing-staffing-needs-how-id-predict-daily-museum-visitors-before-they-arrive-part-2-760498e03c4f?source=rss-58e995b6d0e3------2</link>
            <guid isPermaLink="false">https://medium.com/p/760498e03c4f</guid>
            <category><![CDATA[museum-analytics]]></category>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[visitor-forecasting]]></category>
            <category><![CDATA[full-stack-development]]></category>
            <dc:creator><![CDATA[Herman Wandabwa]]></dc:creator>
            <pubDate>Mon, 13 Apr 2026 00:50:47 GMT</pubDate>
            <atom:updated>2026-04-25T03:23:50.890Z</atom:updated>
            <content:encoded><![CDATA[<h4>A hands-on guide to deploying an XGBoost model as a REST API and interactive dashboard for daily visitor forecasting</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*0A9bW6ePT23GSEC_ml02VQ.png" /><figcaption>Image generated by the author</figcaption></figure><p><strong>Author’s note:</strong> <em>This article uses independently created code and synthetic data for illustration. The methods described are standard forecasting and machine learning techniques commonly used across many application areas.</em></p><p><strong>Stuck behind a firewall? Read the article for FREE </strong><a href="https://hermanwandabwa.medium.com/760498e03c4f?source=friends_link&amp;sk=91968c50229a25008617d04de471b7ce"><strong>here</strong></a></p><p>In <a href="https://medium.com/data-science-collective/stop-guessing-staffing-needs-predicting-daily-museum-visitors-before-they-arrive-part-1-336ac0ca4f60">Part 1</a>, I covered the data science workflow behind this project. This included the process of generating synthetic museum visitor data, engineering 80+ features from 26 raw columns, training and comparing multiple models, and evaluating the winner. XGBoost took the top spot with a MAPE of 14.0%. I also detailed the process of framing model performance in business terms that museum directors actually care about.</p><p>But a model sitting in a Jupyter notebook is completely useless to a museum operations manager who needs to plan staffing for next Tuesday. The real value comes from making predictions accessible to the people who need them through a working application with a user interface.</p><p>In this part, I’ll take the trained model and deploy it as a full-stack application: a FastAPI backend that exposes the model as an API, a React dashboard for the frontend, and a Supabase database for logging predictions. By the end of this article, you’ll understand not just <em>what</em> I built, but <em>why</em> each architectural decision was made.</p><p>I recommend once again that you start with <a href="https://medium.com/data-science-collective/stop-guessing-staffing-needs-predicting-daily-museum-visitors-before-they-arrive-part-1-336ac0ca4f60">Part 1</a> if you haven’t read it yet as it literally covers the <em>why</em> and <em>what</em> behind the model. On the other hand, this article covers the <em>how</em> of getting it into production. As always, all code is open-sourced <a href="https://github.com/wandabwa2004/museum-visitor-prediction">here</a>.</p><h3><strong>From a Notebook to Production — The Full Stack</strong></h3><ol><li><strong>Why We Need a Backend at All</strong></li></ol><p>Before diving into code, let’s establish why a backend in a production setting exists in the first place. The trained XGBoost model is basically a Python object that lives in a .json file on disk and can only be used by Python code that knows how to load it.</p><p>On the other hand, a React web dashboard is JavaScript running in a browser that cannot directly load a Python model. Therefore, the solution is a <strong>backend server</strong> which is just a Python application that loads the model and exposes it over HTTP. Any client e.g., a browser, mobile app, or spreadsheet plugin can then send it a request and receive a prediction in return.</p><p>This pattern is called a <a href="https://www.redhat.com/en/topics/api/what-is-a-rest-api"><strong>REST API</strong></a>(Representational State Transfer Application Programming Interface). Think of it like a restaurant: you (the frontend) don’t walk into the kitchen and cook your own food. You hand a request to a waiter (the API), the kitchen (the backend) prepares the response, and the waiter brings it back to you in a standard format. You never need to know how the kitchen works.</p><p><strong>a) Backend — FastAPI</strong></p><p>I used a <a href="https://fastapi.tiangolo.com/">FastAPI </a>framework that is a modern Python one for building APIs quickly and correctly. Two things make it particularly good for ML serving:</p><ol><li><strong>Automatic data validation —</strong> You just define the shape of your request and response using Python type hints, and FastAPI automatically validates incoming requests and rejects malformed ones before they can crash your model.</li><li><strong>Auto-generated documentation —</strong> FastAPI produces a live /docs page that lists every endpoint, what it accepts, and what it returns. This is invaluable when frontend developers (or your future self) need to understand how to call the API.</li></ol><p>The backend in this application exposes four endpoints, each serving a distinct purpose:</p><pre>- `POST /api/predict` - The core endpoint. Accepts a date and optional context parameters (weather, holidays, special events), runs them through the model, and returns a prediction with confidence intervals and a traffic tier label.<br>- `GET /api/historical` - Returns paginated historical visitor data with optional date filters. This powers the charts in the frontend&#39;s Historical tab.<br>- `GET /api/insights` - Returns model performance metrics, feature importance rankings, and a comparison table of all the models trained in Part 1. This powers the Insights tab.<br>- `GET /api/date-info` - A smart helper endpoint: given a date, it automatically detects Victorian public holidays and fetches weather forecasts. The frontend calls this whenever the user picks a date, so the prediction form populates itself without requiring manual input.</pre><p><strong>Loading the Model Once — <em>The Lifespan Pattern</em></strong></p><p>One of the first problems you’ll encounter when deploying an ML model is the thought of deciding when the model is to be loaded.<em> </em>The naive approach is to load the model at the top of your script as a global variable. This works, but it has a problem in production where web servers typically run multiple worker processes to handle concurrent requests. A global variable in one process is not shared with another process meaning you’d be loading the model independently in each worker, which wastes memory.</p><p>A better approach and I stand to be corrected is FastAPI’s lifespan context manager. This is a special function that runs once when the application starts up, before it begins accepting any requests:</p><pre>@asynccontextmanager<br>async def lifespan(app: FastAPI):<br>    ModelService.load()<br>    yield<br><br>app = FastAPI(<br>    title=&quot;Museum Visitor Prediction API&quot;,<br>    description=&quot;Predict daily visitor numbers for the Melbourne Museum of Migration&quot;,<br>    version=&quot;1.0.0&quot;,<br>    lifespan=lifespan,<br>)</pre><p>The yield keyword is what makes this a context manager. Everything before yieldruns at startup and everything after yield (if you add cleanup code) runs at shutdown. The ModelService.load() call loads the XGBoost model from disk, reads the feature list, loads historical data, and computes residual statistics. This is the heavy one-time work that should not be repeated on every request.</p><p><strong>The ModelService Singleton</strong></p><p>The ModelService class is designed as a singleton, which is just a<strong> </strong>software pattern where a class has at most one instance, and that instance is shared across the whole application.</p><p>Rather than using Python’s normal instance creation model = ModelService(), all the methods are marked as @classmethod. This means they operate on the class itself, not on an instance. The model, feature list, historical data, and residual stats are stored as class-level variables, not instance variables. The effect is straightforward; there’s exactly one copy of the model in memory no matter how many API calls come in simultaneously, and every call reads from the same shared state.</p><pre>Normal class: ModelService() → creates new object every time<br>Singleton pattern: ModelService.predict(…) → always uses the same shared state</pre><p>This matters because XGBoost models can be large (tens or hundreds of megabytes). Loading one per request would exhaust memory within seconds under real traffic. Therefore loading one at startup and sharing it is the correct production pattern. Don’t worry if the above doesn’t make much sense as I’ll dig deeper in the interesting parts of the code.</p><p>Before running any of these steps, please follow this <a href="https://github.com/wandabwa2004/museum-visitor-prediction/blob/dev/article/setup_guide.md">setup guide</a> to run the backend processes first. You’d normally get an output like the following if the setup worked correctly:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/620/1*smLXTXkpV8oqPTDRhQzvIw.png" /><figcaption>A working backend</figcaption></figure><p>Remember are all these are setup for a local deployment.</p><p>When a user clicks the <strong>Predict </strong>button<strong> </strong>in the frontend that I’ll describe later, this is what really happens:</p><ul><li><strong>Request arrives — </strong>The frontend sends an HTTP POST request to /api/predictwith a JSON body containing the date and any contextual parameters the user provided or the auto-fetch populated (weather, holidays, events).</li><li><strong>Feature construction — </strong>The raw request has only a handful of fields. But the model was trained on 81 features. The <em>build_feature_row()</em><strong> </strong>function bridges this gap. It takes the date and context, looks up recent visitor history (needed for lag features), applies the same cyclical encodings and interaction terms that were built during feature engineering in <a href="https://medium.com/data-science-collective/stop-guessing-staffing-needs-predicting-daily-museum-visitors-before-they-arrive-part-1-336ac0ca4f60">Part 1</a>, and returns a single-row DataFrame with all 81 features in the exact order the model expects. For context, this is one of the most common failure points in ML deployments where a <strong>training-serving skew </strong>happens. This is where if the feature engineering code in the backend differs even slightly from what was done during training, the model will silently produce wrong predictions. I stored the feature list in a features.json(the same file produced by the training notebook), so the backend reads that file and uses it as the authoritative column ordering.</li><li><strong>Model inference and inverse transformation — </strong>There’s something interesting that’s happening in the code below. In Part 1, the XGBoost model was trained on log1p(visitors) rather than visitorsdirectly. If you remember, it was because visitor counts were right-skewed. This meant that on most days the museum received around 200–400 visitors, but some peak days spiked to approximately 1,200. This skew can confuse tree-based models that try to minimise squared error , meaning they’ll <strong><em>underpredict </em></strong>peaks to avoid large penalties on the majority of ordinary days. Taking the log then compresses this scale in the lines of log1p(200)≈5.3 and log1p(1200)≈7.1. The model now sees a more uniform distribution and learns better. But the prediction comes back in log-space, so we have to invert it. np.expm1(raw_prediction)converts the log-space output back to visitor counts. The max(0,point)call in the code clips any negative predictions (which are physically impossible for visitor counts) to zero.</li></ul><pre>@classmethod<br>def predict(cls, feature_row: pd.DataFrame) -&gt; dict:<br>    raw = cls._model.predict(feature_row)<br>    point = float(np.expm1(raw[0] + cls._xgb_log_offset)) \<br>        if cls._best_model_name == &quot;XGBoost&quot; else float(raw[0])<br>    point = max(0, point)</pre><ul><li><strong>Prediction intervals — </strong>A single number e.g., 347 visitors as the prediction is less useful than say a range (<em>“between 290 and 405 visitors with 80% confidence”</em>). Normally, prediction intervals communicate uncertainty, which is critical for staffing decisions such that if the interval is wide, schedule conservatively etc.</li></ul><pre>std = cls._residual_std or 50.0<br>lower_80 = max(0, int(point - 1.28 * std))<br>upper_80 = int(point + 1.28 * std)<br>lower_95 = max(0, int(point - 1.96 * std))<br>upper_95 = int(point + 1.96 * std)</pre><p>The residual_std above is the standard deviation of the model’s prediction errors on the training data. To simplify it, residuals are just the differences between what the model predicted and what actually happened. For example, if the model predicted 350 but 410 came in, then the residual is 60. Computing the standard deviation of all residuals gives us a single number that summarises how spread out the errors typically are.</p><p>The multipliers 1.28 and 1.96 come from the <a href="https://www.gcp-service.com/what-is-the-normal-distribution"><strong>normal distribution</strong></a><strong> </strong>whereby under a normal distribution:</p><pre>- 80% of values fall within ±1.28 standard deviations of the mean<br>- 95% of values fall within ±1.96 standard deviations of the mean</pre><p>Therefore, <em>point ± 1.28 * std</em> gives us the 80% prediction interval whereby in 8 out of 10 days, actual visitors should fall inside this range. The 95% interval is wider as it needs to capture more of the possible outcomes. This is called a <a href="https://www.sciencedirect.com/topics/computer-science/gaussian-approximation"><strong><em>Gaussian approximation</em></strong></a> of the prediction interval. It’s not perfect (residuals are rarely perfectly normal), but it’s a practical and interpretable first approximation that works well in production.</p><ul><li><strong>Traffic tier classification and response</strong></li></ul><pre>if point &lt; 250:   tier = &quot;Low&quot;<br>elif point &lt; 450: tier = &quot;Medium&quot;<br>else:              tier = &quot;High&quot;<br><br>return {<br>    &quot;predicted_visitors&quot;: int(round(point)),<br>    &quot;lower_80&quot;: lower_80, &quot;upper_80&quot;: upper_80,<br>    &quot;lower_95&quot;: lower_95, &quot;upper_95&quot;: upper_95,<br>    &quot;traffic_tier&quot;: tier,<br>    &quot;confidence&quot;: round(max(0.0, min(1.0, float(r2))), 3),<br>    &quot;model_used&quot;: cls._best_model_name,<br>}</pre><p>The traffic tier translates the raw number into an operational instruction. A museum operations manager doesn’t just need a number, they need an action. This could be something like <em>call in extra staff, run normal operations, or reduce floor coverage</em>. <strong>Low/Medium/High</strong> maps directly to those three staffing modes. This is an example of designing the output around the decision it needs to support.</p><p><strong>The Date Info Endpoint — Smart Defaults That Reduce Friction</strong></p><p>The /api/date-info endpoint demonstrates something I think is under-appreciated in data product design: reducing the cognitive load of correct usage.</p><p>The prediction form needs several inputs beyond just the date: <em>Is it a public holiday? What will the weather be? Is there a school holiday?</em> A naive form would present empty text boxes and leave it to the user to look all this up. That creates friction and introduces human error. For example, the user might not know that Anzac Day (April 25th) is a Victorian public holiday, or might enter the wrong temperature.</p><p>Instead, when the user selects a date, the frontend automatically calls /api/date-info which:</p><ol><li><strong>Checks the Victorian public holiday calendar </strong>— a hardcoded list of known dates plus rules for floating holidays (like Easter).</li><li><strong>Fetches weather from the Open-Meteo API </strong>— a free weather API that provides both historical data and 7-day forecasts. For dates in the forecast window, it returns the forecast. For past dates, it returns historical actuals. For dates beyond the forecast window (more than 7 days out), it falls back to seasonal defaults based on Melbourne’s average climate by month.</li></ol><p>The user sees the prediction form pre-populated with contextually appropriate values, each labelled with a badge showing its source: <strong>Auto</strong>, <strong>Forecast</strong>, <strong>Historical</strong>, or <strong>Seasonal Default</strong>. If they want to override any of these, they can as the form switches to manual mode and stops auto-fetching that field. This preserves user intent while still providing smart defaults.</p><p>This is an example of UX design thinking applied to a data product. The goal is not just for the model to be accurate, but for the application to be <em>usable by someone who is not a data scientist</em>.</p><h4><strong>Frontend — React + Tailwind Dashboard</strong></h4><p>The frontend is a React 18 application built with Vite and styled with Tailwind CSS as shown below and can also be accessed <a href="https://museum-visitor-prediction.vercel.app/"><strong>HERE</strong></a>. It has three tabs, each serving a distinct purpose for a different type of user question.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*gC3OnWPijyJWt1Dj-8bbug.png" /><figcaption>Front-end Dashboard</figcaption></figure><p><strong>Why React? </strong>React is a JavaScript library that makes it practical to build UIs where the data and the visual representation stay in sync. For example, when a user selects a new date, the form fields, weather badges, and prediction result all need to update together. React’s component model makes this manageable where each piece of the UI is a self-contained component that re-renders when its data changes, instead of requiring you to manually update the DOM for every change.</p><p><strong>Why Vite?</strong> Vite on the other hand is the tool that bundles and serves the React application. For development, it provides instant reloading such that when you save a file, the browser updates in under a second without losing the current UI state. For production, it produces an optimised static bundle (HTML, CSS, JavaScript files) that can be hosted on any static file host like Vercel or Netlify.</p><p><strong>Why Tailwind CSS?</strong> Tailwind is a utility-first CSS framework. Instead of writing custom CSS classes, you apply small single-purpose utility classes directly in your JSX markup (`<em>text-lg font-bold text-blue-600</em>`). The result looks verbose but produces consistent, responsive designs without the overhead of managing a separate stylesheet.</p><p><strong>Predict Tab</strong></p><p>This is the primary view. It presents a form where the user selects a date. It then auto-populates the weather and holiday fields appear immediately after date selection via the /api/date-info call. The user can override any field if they have better information (e.g., they know a special event is happening that day).</p><p>When users submit, the prediction result panel appears alongside the form as below:</p><ul><li>The large point estimate (e.g., “538 visitors”)</li><li>The 80% and 95% confidence intervals displayed as a range bar</li><li>The traffic tier badge (Low / Medium / High) in a colour-coded chip</li><li>Estimated revenue that is computed by multiplying the predicted visitor count by an assumed average spend per visitor. This can be customised.</li><li>The model name and its R² score, so the user understands which model produced this result</li></ul><p>Displaying confidence intervals rather than just a point estimate is a deliberate choice. A museum director who sees “<em>Low: 526/ High: 549”</em> immediately understands there is uncertainty in the prediction and can plan for the upper bound when in doubt. However, if the person only sees “538” might staff for exactly 538 and be caught out on a day that comes in at say 600.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/958/1*f6KsEHEjl0ZOx70nEZOr2g.png" /><figcaption>Predictions Tab</figcaption></figure><p><strong>Historical Tab</strong></p><p>This tab shows interactive charts of historical visitor patterns over the past year, built with <a href="https://recharts.github.io/"><strong>Recharts</strong> </a>— a charting library for React that renders responsive SVG charts.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/952/1*IxQRFq0M0E0Bl99tIxGUGg.png" /><figcaption>Historical Visitor Numbers</figcaption></figure><p>The primary value here is context. Before making a staffing decision based on a prediction, a manager might want to know something like: <em>“Is next Saturday typically busy at this time of year? Has that changed recently?”</em> The historical chart lets them visually scan for seasonal patterns, spot anomalies, and build intuition about the data the model was trained on.</p><p>The data is fetched from the GET /api/historicalendpoint, which supports date-range filtering and pagination so large date ranges don’t overwhelm the browser.</p><p><strong>Insights Tab</strong></p><p>This tab is aimed at a more technically curious user. Perhaps an internal analytics team or a curious manager who wants to understand <em>why</em> the model is making certain predictions.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/872/1*TCL_ZjNv85w4e3ZzFGm2aA.png" /><figcaption>Insights Tab</figcaption></figure><p>It shows:</p><ul><li><strong>Model performance metrics </strong>— MAE, RMSE, R², MAPE for the best model, displayed as summary stat cards. I described some of them in Part 1.</li><li><strong>Feature importance rankings</strong> — the top features the model relies on, pulled from the SHAP analysis done in Part 1. This answers questions like <em>“Is the model mostly driven by day-of-week, or does weather matter a lot?”</em></li><li><strong>Model comparison table</strong> — shows all the models trained (Random Forest, XGBoost, Prophet) side by side, so the user can see how much better XGBoost is compared to the alternatives and understand the trade-offs</li></ul><p>The summary stat cards at the top of the whole dashboard i.e., the average daily visitors, peak day visitors, days of history, best model R² etc., give any user immediate orientation before they start exploring. These numbers answer the first questions anyone asks when they encounter a new dataset: <em>“What are we dealing with here?”</em></p><h4><strong>Deployment Considerations</strong></h4><p>As is, the frontend can be deployed to <a href="https://vercel.com/"><strong>Vercel</strong> </a>(static files, CDN-served) and the backend to <a href="https://railway.com/"><strong>Railway</strong></a> (Python process, model artifacts bundled). Keeping them separate means you can redeploy a CSS fix without touching the model server, and scale each independently. If you want to add a database layer like I did for prediction logging, the repo includes a <a href="https://supabase.com/">Supabase </a>setup — see <a href="https://github.com/wandabwa2004/museum-visitor-prediction/blob/dev/article/setup_guide.md">setup_guide.md</a>.</p><p>Remember to set environment variables to keep secrets out of source code:</p><pre># Backend<br>APP_ENV=production<br># Frontend<br>VITE_API_URL=https://your-backend.railway.app</pre><p>One security note to consider before going live: the backend currently has allow_origins=[&quot;*”], which lets any website call your API from a browser. Remember to restrict it to your frontend domain in production as follows:</p><pre>allow_origins=[&quot;https://museum-dashboard.vercel.app&quot;]</pre><h4><strong>Would I do it differently with real data?</strong></h4><p>Building on synthetic data is a great way to develop and validate a system end-to-end, but I want to be honest about where the assumptions break down.</p><ol><li><strong>Data quality is the hard part — </strong>Synthetic data is clean by construction. Every day has exactly one record, all fields are populated, and the relationships between variables are consistent because I programmed them to be. Real visitor logs are messy. A real data pipeline would encounter missing days (scanners were down, the museum was closed for maintenance, or records simply weren’t captured), duplicate entries (the same group scanned through multiple times), scanner malfunctions (a door counter stuck reporting 12 visitors per hour regardless of actual traffic etc. These systematic errors are hard to detect without cross-validation against staff headcounts or revenue data), and inconsistent recording (different staff members classifying “school group” visits differently across years, making that feature unreliable as a historical signal). Budget significant time, often more than modelling itself for data cleaning as a sophisticated model cannot compensate for unreliable input data.</li><li><strong>Features you cannot simulate — </strong>Real venues have data signals I couldn’t credibly replicate in my experimentation such as:</li></ol><ul><li>Social media mentions — For example a viral Instagram post of a new exhibition can drive a spike in visitors 3–5 days later. Social listening APIs can quantify this.</li><li>Google search trends — search volumes for the museum’s name are a leading indicator of visit intent, available from Google Trends.</li><li>Nearby parking occupancy and public transport ridership — visitors can’t come if they can’t park or get there. Another example is the free transport in Victoria that could spike the numbers.</li><li>Competitor events — a major festival at a nearby venue can either draw visitors away (substitution) or increase foot traffic to the whole area (complementarity).</li></ul><p>Each of these would require integration work but could meaningfully improve predictions. The XGBoost model we built is good at learning from tabular signals. Therefore, adding better signals is often more valuable than tuning the model itself.</p><p><strong>3. Model retraining — </strong>A model trained today will degrade over time, a concept called <a href="https://medium.com/@anicomanesh/model-drift-identifying-and-monitoring-for-model-drift-in-machine-learning-engineering-and-0f74b2aa2fb0">model drift</a>. Normally, the patterns in the world change, but the model’s learned parameters don’t. For a museum, drift sources include a new exhibition type that attracts a different visitor demographic, a pricing change that affects visit frequency, construction nearby that reduces foot traffic for several months, or a global event (a pandemic, an economic recession) that fundamentally shifts cultural activity patterns etc.</p><p>The practical solution to such changes is a scheduled retraining pipeline. Monthly or quarterly, retrain the model on accumulated historical data (including recent data not available at initial training time), compare it against the current production model on a holdout period, and promote the new model if it performs better. A prediction log table that records inputs, outputs, and eventually actual visitor counts is the foundation for this as it gives you a growing dataset of prediction errors that reveals whether the model is drifting.</p><p><strong>4. Shadow mode before going live — </strong>Before using model predictions to make actual staffing decisions, run the model in shadow mode for several weeks. The model generates predictions silently in the background and logs them but they’re not shown to the operations team, who continue making decisions based on their existing process.</p><p>After a few weeks, compare the model’s predictions against actual visitor counts alongside the team’s existing estimates against actual visitor counts.</p><p>This serves two purposes. First, it validates that the model is actually better than the human baseline before it influences real decisions . If its not, then you need to understand why before deploying it. Second, it builds trust with the operations team. When they can see side-by-side that the model’s numbers were closer to reality on 73% of days, they stop treating it as a black box and start treating it as a useful tool.</p><p>Skipping shadow mode and deploying directly to operations is a common and costly mistake. The model might be good, but if the team doesn’t trust it, they’ll ignore it. If it makes a bad call on week one (which it will, eventually), they’ll dismiss it entirely and go back to manual estimates. Shadow mode lets you demonstrate value before you ask people to change their behaviour.</p><h4><strong>Conclusion</strong></h4><p>That’s it for now. Across these two articles, I’ve taken you through the entire process of building a visitor prediction system, from simulating realistic museum data, through feature engineering with 80+ features, training and comparing multiple models, evaluating with SHAP explainability, and deploying as a full-stack application with a FastAPI backend and React dashboard.</p><p>A few things worth noting from the two articles (Part 1 is <a href="https://medium.com/data-science-collective/stop-guessing-staffing-needs-predicting-daily-museum-visitors-before-they-arrive-part-1-336ac0ca4f60">here</a>). XGBoost wins with a MAPE of 14.0%, which is production-ready for operations planning. A 14% error on a day with 350 expected visitors means being off by about 49 people which is manageable for staffing adjustments. Seasonality and context dominate: <em>day-of-year, day-of-week</em>, and <em>contextual factors (boost count, weather, holidays)</em> outrank individual lag features in SHAP importance. The model is essentially learning that <em>“Saturdays in summer are busy; rainy Tuesdays are not.”</em> Feature engineering matters more than model selection as most of the performance gain comes from them. A well-engineered feature set with a simple model often beats a poorly-engineered one with a sophisticated model. In addition, business framing is everything. For example, traffic tier classification and revenue estimates speak louder than R² scores. An R² of 0.89 means nothing to a museum director. However, identifying a <em>“Low/Medium/High</em> staffing tier” is immediately actionable.</p><p>If you combine this temporal prediction with the spatial analytics from my <a href="https://medium.com/data-science-collective/finding-the-perfect-spot-wifi-analytics-for-premium-space-advertising-b8470dd9181b">WiFi article</a>, you have a complete visitor intelligence system. You’ll have the power to predict <em>how many </em>will come, then optimise <em>where</em> they go within the venue. This sounds like a million-dollar startup idea 🚀.</p><p>As always, all the code is open-sourced <a href="https://github.com/wandabwa2004/museum-visitor-prediction">here</a>. Clone the repo, run the notebooks in order (01 through 05), spin up the <em>backend</em> with uvicorn app.main:app --reload, and the <em>frontend </em>with npm run dev . Please feel free to adapt it for your own venue, whether it’s a museum, gallery, shopping mall, or event space.</p><p>I hope this walkthrough was useful. Don’t forget to follow me, clap for me, and leave a comment. Let me know if you’d like me to extend this to real-time prediction with streaming data or visitor segmentation (families vs solo visitors vs groups). If you want to check out my other articles, you can find them on my <a href="https://medium.com/@hermanwandabwa">profile</a>. I’m also happy to connect via <a href="https://www.linkedin.com/in/wandabwaherman/">LinkedIn</a>.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=760498e03c4f" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-science-collective/stop-guessing-staffing-needs-how-id-predict-daily-museum-visitors-before-they-arrive-part-2-760498e03c4f">From Jupyter Notebook to Full-Stack ML App with FastAPI, React, and Supabase</a> was originally published in <a href="https://medium.com/data-science-collective">Data Science Collective</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Stop Guessing Staffing Needs: Predicting Daily Museum Visitors Before They Arrive (Part 1)]]></title>
            <link>https://medium.com/data-science-collective/stop-guessing-staffing-needs-predicting-daily-museum-visitors-before-they-arrive-part-1-336ac0ca4f60?source=rss-58e995b6d0e3------2</link>
            <guid isPermaLink="false">https://medium.com/p/336ac0ca4f60</guid>
            <category><![CDATA[data-analysis]]></category>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[business]]></category>
            <dc:creator><![CDATA[Herman Wandabwa]]></dc:creator>
            <pubDate>Mon, 06 Apr 2026 01:55:46 GMT</pubDate>
            <atom:updated>2026-04-25T02:34:18.651Z</atom:updated>
            <content:encoded><![CDATA[<h3>Predicting Daily Museum Visitors for Better Staffing Decisions</h3><h4>How I built an XGBoost forecasting pipeline to turn visitor demand into staffing insight for a Melbourne museum.</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*NE_6oLcOKfSGKkwOr1CV3A.png" /><figcaption>Image generated by the author</figcaption></figure><p><strong>Author’s note:</strong> <em>This article uses independently created code and synthetic data for illustration. The methods described are standard forecasting and machine learning techniques commonly used across many application areas.</em></p><p><strong>Stuck behind a firewall. Read the article for free </strong><a href="https://hermanwandabwa.medium.com/stop-guessing-staffing-needs-predicting-daily-museum-visitors-before-they-arrive-part-1-336ac0ca4f60?source=friends_link&amp;sk=bf7b550e648beb7d5690552680531038"><strong>HERE</strong></a></p><p>It’s been a while since I wrote my last article on <a href="https://hermanwandabwa.medium.com/finding-the-perfect-spot-wifi-analytics-for-premium-space-advertising-b8470dd9181b?sk=79f7744a1237aa3f5eb63826a2e2b82e">WiFi Analytics</a>. For context, I outlined a simple process of analyzing foot traffic patterns to identify the best locations for premium advertisements or exhibitions in malls using WiFi pings. It was a graph-inspired analysis to rank the locations by reach, dwell time, and flow. Please read it here for <a href="https://hermanwandabwa.medium.com/finding-the-perfect-spot-wifi-analytics-for-premium-space-advertising-b8470dd9181b?sk=79f7744a1237aa3f5eb63826a2e2b82e">free </a>to get the whole idea. As usual, I also <a href="https://github.com/wandabwa2004">shared all the code</a> for reuse.</p><p>But here’s the thing: knowing where people go is only half the puzzle. The other half is knowing how many are likely to show up in the first place. I’m sure every mall/museum/exhibition operations manager has sleepless nights when the question <em>“How many visitors should we expect today?</em>” crosses their mind.</p><p>This is what led me to writing this article. It is a natural progression from the WiFi analytics piece and is split in two parts. Whereas the WiFi piece focused on spatial patterns, this one (Part 1) focuses on temporal prediction of visitor numbers. This is not a simple process contrary to what some people might think. I’ll try to provide a picture of how prediction of visitor numbers happens in such venues.</p><p>I also pivoted a little from the shopping/malls use cases and actually focused on the museum space and specifically the <strong>Immigration</strong> <strong>Museum </strong>in <strong>Melbourne, Australia</strong>. This venue actually exists and I chose it based on several factors. Melbourne’s Southern Hemisphere seasons, Victorian public holidays, and unpredictable weather make for an interesting prediction challenge as these are all features that will be packaged in the modelling process. In addition, and this is personal, it’s the only museum that I pass by when travelling into the city of Melbourne. Maybe that’s why the name and interest stuck.</p><p>As outlined earlier, Part 1 covers the synthetic data generation process through to feature engineering, model training and evaluation. Part 2 will cover the deployment part with a production-ready FastAPI backend and a React dashboard. All together, they form an end-to-end pipeline that anyone can leverage when building a related solution. As always, all code is open-sourced <a href="https://github.com/wandabwa2004/museum-visitor-prediction/tree/dev/notebooks">here</a>. Please feel free to clone the repo and adapt it for your own venue.</p><h4><strong>1. The Problem and Why Prediction Matters</strong></h4><p>Museums and cultural venues face daily operational challenges whereby if you staff too many people on a quiet Tuesday, then you&#39;re burning budget. On the contrary, if you staff too few on a public holiday or weekend then visitor experience suffers. The same logic applies to other related areas like ticketing, catering, exhibit rotations, etc.</p><p>A properly thought-through predictive model captures all of these factors simultaneously and outputs a number along with a confidence interval and even a traffic tier that the operations team can actually rely and act on. Therefore, with such a model in place, operations managers in such venues are likely to ask something like <em>&quot;Do we need the Low, Medium, or High staffing plan today?&quot;</em> instead of <em>&quot;how many visitors will we get?&quot;</em>. Enough of the theory. Let&#39;s jump to the interesting bits.</p><h4><strong>2. Dataset Design &amp; Simulation</strong></h4><p>The first challenge in any predictive modelling project is data. I didn’t have access to real museum visitor logs, so I opted to generate a synthetic dataset that largely reflects realistic museum visitation patterns. You can also replicate this and especially if you’re wanting to control the signal-to-noise ratio and demonstrate some methodology before deploying on messy real-world data.</p><p>I simulated a visitations dataset spanning three years (January 2023 to December 2025). This amounted to about 1,096 daily records.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*vjqGBQDWbFx2bQMblOtwIg.png" /><figcaption>Visitor Counts over time</figcaption></figure><p>The greyish line in the plot is the raw daily count; the red line is the 30-day moving average that reveals the seasonal rhythm e.g., summer peaks, winter troughs, and a slight upward trend over the three years.</p><p>Each record captures the following:</p><ul><li><strong>Temporal features</strong>: <em>day of week, month, quarter, week of year</em></li><li><strong>Weather</strong>: <em>temperature (Melbourne-specific ranges), precipitation, weather type (Sunny/Partly Cloudy/Cloudy/Rainy)</em></li><li><strong>Calendar events</strong>: <em>Victorian public holidays (Australia Day, Anzac Day, Melbourne Cup Day, etc.) and school holidays</em></li><li><strong>Venue events</strong>: <em>special exhibitions (9 across 3 years, each lasting 2–3 months), local events (F1 Grand Prix, Arts Festival), marketing campaigns, ticket promotions</em></li></ul><p>This approach might present some deviation from the real data but the key here is “realism”. For example, Melbourne’s climate is specific where January averages about 20°C while July averages 9.5°C (Southern Hemisphere, remember). Victorian public holidays like Melbourne Cup Day are usually held on the first Tuesday of November and are unique to this state.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*OavpSIGnRPi0pVypMDef-g.png" /><figcaption>Weather Impact on Visitor Numbers</figcaption></figure><p>From the data and the plot above, sunny days average noticeably higher visitor counts than rainy ones, and there’s a clear positive trend between temperature and visitor numbers. Anyone who’s experienced a Melbourne winter knows outdoor plans take a hit, and this is exactly what you’d expect for an outdoor-accessible venue in Melbourne.</p><figure><img alt="" src="https://cdn-images-1.medium.com/proxy/1*ijRXrsgFiIGg5h3TbhUpPA.png" /><figcaption>Temporal Patterns in Visitor Traffic</figcaption></figure><p>Alternatively, the above plot has four views of the same temporal story: weekends (Saturday and Sunday, highlighted in red) drawing 30 to 45% more visitors than midweek; December and January peak at ~40% above average thanks to summer holidays; Q1 and Q4 dominate quarterly traffic. The heatmap in the bottom-right reveals a deeper interaction whereby a Saturday in December is the busiest cell in the entire grid.</p><p>In the simulation process, the visitor count itself is generated using <em>multiplicative factors</em> applied to a base of 250 (arbitrary number that can be changed) visitors/day as follows:</p><pre># Day of week effect<br>day_multiplier = {0: 0.90, 1: 0.85, 2: 0.88, 3: 0.92, 4: 1.00, 5: 1.30, 6: 1.25}<br>df[&#39;visitors&#39;] *= df[&#39;day_of_week&#39;].map(day_multiplier)<br><br># Seasonal effect (Southern Hemisphere)<br>season_multiplier = {<br>    1: 1.40, 2: 1.35, 3: 1.15, 4: 1.05, 5: 0.90, 6: 0.80,<br>    7: 0.75, 8: 0.85, 9: 1.00, 10: 1.10, 11: 1.20, 12: 1.45<br>}<br>df[&#39;visitors&#39;] *= df[&#39;month&#39;].map(season_multiplier)<br><br># Holiday effects<br>df.loc[df[&#39;is_public_holiday&#39;] == 1, &#39;visitors&#39;] *= 1.50<br>df.loc[df[&#39;is_school_holiday&#39;] == 1, &#39;visitors&#39;] *= 1.30<br><br># Weather impact<br>df.loc[df[&#39;weather_type&#39;] == &#39;Sunny&#39;, &#39;weather_factor&#39;] = 1.20<br>df.loc[df[&#39;weather_type&#39;] == &#39;Rainy&#39;, &#39;weather_factor&#39;] = 0.70<br>df[&#39;visitors&#39;] *= df[&#39;weather_factor&#39;] * df[&#39;temp_factor&#39;]<br><br># Event and promotion effects<br>df.loc[df[&#39;special_exhibition&#39;] == 1, &#39;visitors&#39;] *= 1.25<br>df.loc[df[&#39;local_event&#39;] == 1, &#39;visitors&#39;] *= 1.40</pre><p>There is a high chance that the base number will be different but I hope you get the point. Beyond the core multipliers, each factor has a measurable impact on visitor numbers. The chart below shows the average uplift from each binary factor. Public holidays boost attendance by ~50%, local events by ~40%, and even marketing campaigns deliver a measurable ~15% lift. Remember all these factors are hypothetical but I believe they largely represent the reality.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ucOaOCj_Yd-H9IBrFt8rGw.png" /><figcaption>Impact of Key Factors on Visitor Numbers</figcaption></figure><p>Each panel in the above plot compares average daily visitors with and without that factor active. The percentage labels show the relative uplift. Public holidays and local events dominate, but the smaller effects (marketing, promotions) are still meaningful when they stack together.</p><p>I also added 12% Gaussian noise, a 2% outlier rate (days with 1.5 to 2.5x normal traffic), and a slow growth trend (~0.01% daily). The final values are clipped between 50 and 1,200 visitors. These choices create a dataset that’s statistically rich enough to demonstrate the methodology while being reproducible. Remember, this is a simulated dataset and should not be used as a replacement for real data.</p><p>Looking at the year-over-year monthly averages, you can see the seasonal pattern is consistent across all three years, with 2025 sitting slightly above 2023 thanks to that slow growth trend baked into the simulation.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*iqJjVIncI4IwZbGEpx7_sA.png" /><figcaption>Year-over-Year Monthly Comparison</figcaption></figure><p>All three years follow the same seasonal arc. A summer peak in December to January, a winter period between June and July, and a gentle upward shift each year reflecting the long-term growth factor.</p><p>The resulting distribution shows the right-skewed shape typical of real visitor data: most days cluster between 200 and 400 visitors, with a long tail stretching past 800 on peak days.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*vQIgkR7n4wwuDn_CQ52nDA.png" /><figcaption>Visitor Count Distribution</figcaption></figure><p>The histogram on the left shows the characteristic right skew (skewness ≈ 0.9), with the mean sitting above the median and a handful of high-traffic outlier days pulling the average upward. The Q-Q plot confirms the departure from normality in the tails, motivating the log transform we’ll apply during feature engineering.</p><h4><strong>3. Feature Engineering</strong></h4><p>I’ve always insisted in my previous articles that raw data can get you baseline predictions. However, feature-engineered ones are more likely to get you good predictions. Starting from ~26 raw columns, I engineered 81 features grouped into eight categories:</p><pre>Feature groups:<br>  Temporal     : 15<br>  Calendar     :  2<br>  Events       :  5<br>  Weather      :  7<br>  Season       :  4<br>  Interaction  :  7<br>  Lag          : 14<br>  Rolling      : 27</pre><p>Here are the most important ones:</p><p><strong>a) Cyclical Encoding</strong> — Why Day 0 and Day 6 Should Be “Close”</p><p>Here’s a subtle problem with temporal features. If you feed day_of_week directly into a model as values 0 to 6, the model treats Monday (0) and Sunday (6) as being far apart. In reality, they&#39;re adjacent days. The same applies to months. December (12) and January (1) are close, not opposites.</p><p>The fix is cyclical encoding using sin/cos transforms:</p><pre>df[&#39;dow_sin&#39;] = np.sin(2 * np.pi * df[&#39;day_of_week&#39;] / 7)<br>df[&#39;dow_cos&#39;] = np.cos(2 * np.pi * df[&#39;day_of_week&#39;] / 7)<br><br># Month: 1-12 -&gt; cyclical<br>df[&#39;month_sin&#39;] = np.sin(2 * np.pi * (df[&#39;month&#39;] - 1) / 12)<br>df[&#39;month_cos&#39;] = np.cos(2 * np.pi * (df[&#39;month&#39;] - 1) / 12)<br><br># Day of year: 1-365 -&gt; cyclical<br>df[&#39;doy&#39;] = df[&#39;date&#39;].dt.dayofyear<br>df[&#39;doy_sin&#39;] = np.sin(2 * np.pi * df[&#39;doy&#39;] / 365)<br>df[&#39;doy_cos&#39;] = np.cos(2 * np.pi * df[&#39;doy&#39;] / 365)</pre><p>Think of it this way: you’re placing each day on the face of a clock. Monday and Sunday end up right next to each other, and December sits beside January. Both sin and cos components are needed because sin alone can’t distinguish between two days that map to the same y-coordinate (e.g., Tuesday and Friday might have the same sinvalue, but different cos values).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/707/1*3YizfvBOmkq9Pw1lsEee6w.png" /><figcaption>Cyclical Encoding</figcaption></figure><p>On the left, days of the week are mapped onto a circle via <em>sin/cos</em>. On the right, months on the same circular encoding. Both <em>sin </em>and <em>cos</em> components are needed to uniquely identify each position on the circle.</p><p>I applied this to day-of-week, month, day-of-year, and week-of-year.</p><p><strong>b) Lag Features —</strong> “Yesterday Predicts Tomorrow”</p><p>Recent history turned out to be one of the most important predictive signals in this dataset. For example, if 400 people visited yesterday, chances are today will be closer to 400 than to 200. Therefore, I replicated the same lags at strategic intervals:</p><pre>for lag in [1, 2, 3, 14, 21, 28, 60, 90]:<br>    df[f&#39;visitors_lag_{lag}&#39;] = df[&#39;visitors&#39;].shift(lag)<br><br># Same weekday average (last 4 occurrences)<br>df[&#39;mean_last_4_same_dow&#39;] = df[[&#39;lag_7_same_dow&#39;,&#39;lag_14_same_dow&#39;,<br>                                  &#39;lag_21_same_dow&#39;,&#39;lag_28_same_dow&#39;]].mean(axis=1)</pre><p>Why these specific intervals? Lags 1 to 3 capture short-term momentum. Lag 7 captures the same-day-last-week pattern (crucial for museums where weekdays and weekends behave very differently). Lags 14, 21, and 28 capture fortnightly and monthly rhythms. Lags 60 and 90 capture seasonal trends. The same-weekday average of the last 4 weeks is particularly powerful because it answers the question: <em>“How many people typically visit on a Wednesday?”</em></p><p>The autocorrelation plot from the EDA confirms why these lag choices matter. There’s a clear spike at lag 7 (the weekly cycle) and a gradual decay that persists out to 60+ days, reflecting the seasonal structure.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*-XcsVAgm8BA1pucuKFV9Qg.png" /><figcaption>Autocorrelation Function</figcaption></figure><p>The ACF shows significant autocorrelation at multiples of 7 days (the weekly rhythm), while the PACF reveals that lags 1 and 7 carry the most independent predictive information, exactly the lags we prioritised in feature engineering.</p><p><strong>c) Rolling Window Statistics</strong></p><p>Beyond point lags, I computed rolling means, standard deviations, maximums, and minimums across multiple windows:</p><pre>for window in [3, 7, 14, 30, 60, 90]:<br>    df[f&#39;rolling_mean_{window}d&#39;] = df[&#39;visitors&#39;].shift(1).rolling(window).mean()<br>    df[f&#39;rolling_std_{window}d&#39;]  = df[&#39;visitors&#39;].shift(1).rolling(window).std()<br>    df[f&#39;rolling_max_{window}d&#39;]  = df[&#39;visitors&#39;].shift(1).rolling(window).max()<br>    df[f&#39;rolling_min_{window}d&#39;]  = df[&#39;visitors&#39;].shift(1).rolling(window).min()<br><br># Exponentially Weighted Moving Average (EWMA) - (more weight on recent days)<br>df[&#39;ewma_7d&#39;]  = df[&#39;visitors&#39;].shift(1).ewm(span=7, adjust=False).mean()<br>df[&#39;ewma_14d&#39;] = df[&#39;visitors&#39;].shift(1).ewm(span=14, adjust=False).mean()<br>df[&#39;ewma_30d&#39;] = df[&#39;visitors&#39;].shift(1).ewm(span=30, adjust=False).mean()</pre><p>The shift(1) is critical here as it prevents data leakage by ensuring we only use information available before the prediction date. The EWMA variants give extra weight to recent observations, which is also useful because visitor patterns can shift gradually (e.g., after a new exhibition opens).</p><p><strong>d) Interaction Features </strong>— When Combinations Matter</p><p>Some effects compound. For example, a sunny weekend during school holidays isn’t just “sunny” + “weekend” + “school holiday”. It’s a perfect storm that drives numbers far above what any individual factor predicts. I captured this with interaction features:</p><pre>df[&#39;weekend_x_public_holiday&#39;] = df[&#39;is_weekend&#39;] * df[&#39;is_public_holiday&#39;]<br>df[&#39;exhibition_x_weekend&#39;] = df[&#39;special_exhibition&#39;] * df[&#39;is_weekend&#39;]<br>df[&#39;sunny_weekend&#39;] = ((df[&#39;weather_type&#39;] == &#39;Sunny&#39;) &amp; (df[&#39;is_weekend&#39;] == 1)).astype(int)<br>df[&#39;marketing_x_exhibition&#39;] = df[&#39;marketing_campaign&#39;] * df[&#39;special_exhibition&#39;]<br><br># Count of active boost factors<br>df[&#39;boost_count&#39;] = (<br>    df[&#39;is_public_holiday&#39;] + df[&#39;is_school_holiday&#39;] +<br>    df[&#39;special_exhibition&#39;] + df[&#39;local_event&#39;] +<br>    df[&#39;marketing_campaign&#39;] + df[&#39;ticket_promotion&#39;]<br>)</pre><p>The <em>boost_count </em>feature is a favourite of mine. It answers a simple question: <em>“How many things are happening today that attract visitors?”</em> Basically, a day with three active boost factors will almost certainly see more visitors than a day with one. The chart below quantifies this stacking effect.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/712/1*2ycNjM8aIMnnKS9ybpSqUA.png" /><figcaption>Average Visitors by Boost Count</figcaption></figure><p>The relationship is nearly monotonic in that each additional active factor adds roughly 50 to 100 visitors on average. Days with zero boost factors average around 300 visitors; days with 3+ factors routinely exceed 500.</p><p><strong>e) Melbourne Seasons (Southern Hemisphere)</strong></p><p>This might seem trivial and personal to me, but it’s the kind of thing that trips up models if you’re deriving your thought process from, say, the Northern Hemisphere. December in Melbourne is summer, not winter:</p><pre>def get_season(month):<br>    if month in [12, 1, 2]: return &#39;Summer&#39;<br>    elif month in [3, 4, 5]: return &#39;Autumn&#39;<br>    elif month in [6, 7, 8]: return &#39;Winter&#39;<br>    else: return &#39;Spring&#39;</pre><p>January visitors run ~40% above average (summer holidays), while July drops to ~25% below average (winter). Getting this wrong would confuse the model completely.</p><p><strong>f) Log Transform of the Target</strong></p><p>The visitor count distribution is right-skewed whereby most days cluster between 200 and 400, but outlier days push above 800. A log transform (<em>log1p</em>) makes this more normally distributed, which helps gradient-based models converge:</p><pre>df[&#39;visitors_log&#39;] = np.log1p(df[&#39;visitors&#39;])</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1000/1*0TDAPdZgtw4i8XP86IOL4Q.png" /><figcaption>Log Transformation on Visitor Numbers</figcaption></figure><p><strong>Left</strong>: the original distribution with a skewness of ~0.9 and the long right tail of peak days. <strong>Right</strong>: after <em>log1p</em>, skewness drops dramatically and the distribution becomes much closer to normal. This is the version that the models will be trained on.</p><p>This means the model trains on the log-transformed target, and predictions need to be inverse-transformed using np.expm1(). I’ll come back to this when we discuss more details on modelling and production API.</p><p>After engineering all 81 features, I checked which ones correlated most strongly with the target. Unsurprisingly, the lag and rolling features dominate the top of the list, with recent history being the single best predictor of tomorrow’s visitors as shown in the below feature correlation plot.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/712/1*aYb_ydbPFTCDgopIfLk7Qg.png" /><figcaption>Top 30 Features by Correlation with Visitor Count</figcaption></figure><p>The rolling means and EWMA features cluster at the top with correlations close to 0.6, followed by the raw lag features. Temporal and calendar features contribute meaningfully but sit in the 0.2 to 0.4 range. This hierarchy directly informed which features the models would lean on most.</p><p>After feature engineering, the longest lag (90 days) means I drop the first 90 rows of data, leaving 1,006 usable rows (still plenty for training and evaluation).</p><p><strong>4. Model Training &amp; Comparison</strong></p><p>With features ready, I trained and compared four approaches. But first, the split strategy.</p><p><strong>Train/Test Split — Respecting Time</strong></p><p>This is a crucial one. With time series data, you cannot use random train/test splits. If you randomly shuffle dates, the model will see future data during training and that’s data leakage, and your metrics are likely to lie to you.</p><p>Instead, I used the last 90 days as a hold-out test set, with everything before as training data. Something like the following:</p><pre>Train: 916 rows  (2023-04-01 → 2025-10-02)<br>Test :  90 rows  (2025-10-03 → 2025-12-31)</pre><p>For cross-validation during training, I used <em>TimeSeriesSplit </em>with 5 folds, which creates expanding windows that always train on past data and validate on future data. At least no cheating here.</p><p><strong>The Models</strong></p><p>Before reaching for sophisticated algorithms, I needed to answer a basic question: <em>How good can one get without any machine learning at all?</em> That’s what baselines are for and is a good practice. This is the same thought process that I applied here. They’re deliberately simple prediction strategies that use just arithmetics and no learned patterns. If your fancy model can’t beat these, then it’s not earning its complexity.</p><p>For the <strong>Baselines, </strong>I used three naive approaches, each slightly more informed than the last:</p><ul><li><strong>Global mean:</strong> predict every day as the average of the training set (~370 visitors). This ignores everything: day of week, weather, holidays. It’s the “shrug” prediction, but it anchors the bottom of our performance range.</li><li><strong>Last week’s visitors (lag 7):</strong> This predicts today’s visitors as whatever happened on the same day last week. This captures the weekly rhythm (Saturdays are busy, Tuesdays are quiet) but nothing else and it can’t adapt to weather changes, holidays, or trends.</li><li><strong>7-day moving average:</strong> This predicts today as the average of the past 7 days. This approach is slightly smoother than the lag-7 approach because it not only dampens single-day spikes, but also has no awareness of why visitor counts change. Basically, it just follows the recent trend.</li></ul><p>The key limitation of all three baselines is that they use a single signal. They can’t combine information. For example, they don’t know that a sunny Saturday during school holidays with a special exhibition is a fundamentally different day from a rainy Tuesday in winter.</p><p>That’s where the ML models come in. They learn to weigh dozens of features simultaneously and capture non-linear interactions between them. I made use of the following algorithms:</p><ol><li><strong>Random Forest </strong>— an ensemble of 300 decision trees, each trained on a random subset of features (max_features=&#39;sqrt&#39;) and data. Each tree independently learns rules like <em>&quot;if it&#39;s a weekend AND there&#39;s a special exhibition AND temperature &gt; 20°C, expect high traffic.&quot;</em> The final prediction averages across all 300 trees, which smooths out individual tree quirks. The min_samples_leaf=5 constraint prevents trees from memorising noise in the training data. Like XGBoost, it was trained on the log-transformed target.</li><li><strong>XGBoost </strong>with 500 sequential estimators where each new tree specifically targets the errors the previous trees got wrong (gradient boosting). Whereas Random Forest builds trees independently and averages them, XGBoost builds them one at a time, with each tree focusing on the residual mistakes. The learning rate of 0.05 keeps each tree’s contribution small, which requires more trees but produces a more robust model. I trained it on log1p(visitors) rather than raw visitor counts. This compresses the scale so the model doesn&#39;t disproportionately chase high-traffic outlier days at the expense of typical ones.</li><li><strong>Prophet</strong> — This is Facebook’s time series model that takes a fundamentally different approach. Instead of learning from engineered features, it decomposes the time series into <em>trend + seasonality + holiday effects</em> and fits each component separately. I configured it with <em>multiplicative seasonality </em>(because seasonal swings scale with the baseline level) and added external regressors for holidays, exhibitions, and weather. It’s purpose-built for time series but less flexible at capturing the kind of complex feature interactions that tree-based models handle naturally.</li></ol><p>Here’s the XGBoost training code:</p><pre>xgb_model = xgb.XGBRegressor(<br>    n_estimators=500,<br>    learning_rate=0.05,<br>    max_depth=6,<br>    subsample=0.8,<br>    colsample_bytree=0.8,<br>    min_child_weight=5,<br>    reg_alpha=0.1,<br>    reg_lambda=1.0,<br>    random_state=42,<br>    n_jobs=-1,<br>    verbosity=0<br>)<br># Train on log-transformed target<br>xgb_model.fit(X_train, y_train_log)<br># Predictions need inverse transform<br>xgb_preds = np.expm1(xgb_model.predict(X_test))</pre><p>The code for the other two approaches can be found <a href="https://github.com/wandabwa2004/museum-visitor-prediction/blob/dev/notebooks/04_modeling.ipynb">here</a>.</p><h4><strong>Results</strong></h4><p>Below are the results:</p><pre>| Model               | MAE   | RMSE  |   R²  |  MAPE |<br>|---------------------|-------|-------|------ |-------|<br>| XGBoost             | 69.9  | 90.7  | 0.62  | 14.0% |<br>| Prophet             | 74.3  | 92.7  | 0.60  | 16.3% |<br>| Random Forest       | 78.4  | 103.7 | 0.50  | 16.3% |<br>| Baseline (7d MA)    | 119.3 | 152.7 | -0.09 | 26.5% |<br>| Baseline (last week)| 125.0 | 162.5 | -0.24 | 27.4% |<br>| Baseline (mean)     | 143.9 | 189.8 | -0.68 | 26.2% |</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*03Vs-lO5AXMPUhPXmySUBw.png" /><figcaption>Model Performance Comparison on Test Set</figcaption></figure><p>From the results above, XGBoost was the winner across all metrics. A MAPE of 14.0% means the model is typically off by about 14%. What this typically means is that for a museum expecting 350 visitors, that’s roughly plus or minus 50 people. This is well within the range that operations teams can plan around.</p><p>Note that the negative R² values for the baselines don’t indicate a bug. A negative R² simply means the model performs worse than just predicting the mean of the test set, which confirms these baselines are too simplistic to be useful.</p><p>The improvement over baselines is substantial. The best baseline (7-day moving average) has an MAE of 119.3 visitors, while XGBoost achieves 69.9, a 41% reduction in error. That gap is the difference between useful predictions and educated guesses.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*6Mt6CcvmsM3tVLvx1w6vJg.png" /><figcaption>Actual Vs Predicted Visitor Numbers</figcaption></figure><p>XGBoost tracks the actual visitor pattern most closely, capturing both the weekly rhythm and the December holiday surge. Random Forest follows the shape but tends to undershoot peaks. Prophet captures the broad seasonal trend but smooths over the day-to-day variability that the tree-based models pick up. It performs surprisingly well given it doesn’t use the full feature set. Its strength is in capturing the yearly and weekly seasonality automatically. However, it can’t leverage the 81 engineered features as effectively as XGBoost.</p><p><strong>5. Model Evaluation &amp; SHAP Explainability</strong></p><p>Raw metrics tell you how well the model performs. However, evaluation tells you where it struggles and why.</p><p><strong>Residual Analysis</strong></p><p>The four-panel residual analysis below gives a comprehensive picture of the model’s error behaviour.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*qpJVUQSuvTMeWBOoBbOWAw.png" /><figcaption>Residual Analysis</figcaption></figure><p>Top-left: residuals over time, the green and red shading shows where the model over- and under-predicts. Top-right: the residual distribution is roughly normal and centred near zero. Bottom-left: the predicted-vs-actual scatter follows the diagonal reasonably well, though the model underestimates the highest-traffic days. Bottom-right: percentage errors cluster within plus or minus 20%, with a slight negative skew.</p><p>The model has a slight negative bias of about -25 visitors/day, meaning it tends to under-predict on average. This is conservative and it’s actually better to under-promise and over-deliver when it comes to staffing. The residual distribution is roughly normal and centred near zero, which is what we want.</p><p><strong>Error Breakdown</strong></p><p>The model doesn’t perform equally well across all conditions:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*GfEnPJpQGHmHhX2kLkntPA.png" /><figcaption>Error Breakdown Analysis</figcaption></figure><p>Top row: MAE by day of week (weekends show higher error), month, and season. Bottom row: MAE split by public holidays, special exhibitions, and weekends. Public holidays show the highest MAE as they are the outlier days with unpredictable spikes.</p><ul><li><strong>Weekdays vs weekends</strong>: Higher error on weekends (more variable traffic)</li><li><strong>Public holidays</strong>: The highest MAE. These are outlier days with unpredictable spikes</li><li><strong>Special exhibitions</strong>: Also higher error, but less than public holidays</li><li><strong>Seasons</strong>: Summer and spring (high-traffic seasons) have larger absolute errors, though percentage errors remain similar</li></ul><p>This is expected. Extreme days are harder to predict because they’re driven by unique combinations of factors. A model that’s <em>“off by 70 visitors”</em> on a 300-visitor day is performing differently than one that’s <em>“off by 70”</em> on a 900-visitor day.</p><p><strong>SHAP Explainability</strong></p><p>I used <strong>SHAP </strong>(SHapley Additive exPlanations) to understand what drives the model’s predictions. This is crucial for two reasons: (1) <em>it validates that the model is learning sensible patterns</em>, and (2) <em>it gives you ammunition for explaining predictions to non-technical stakeholders</em>.</p><pre>explainer = shap.TreeExplainer(model)<br>shap_values = explainer.shap_values(X_shap)<br><br>shap.summary_plot(shap_values, X_shap, plot_type=&#39;dot&#39;, max_display=20)</pre><p>The results confirm what you’d expect:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/763/1*_i0T7B5pFxOEud_Q43IlOA.png" /><figcaption>SHAP Values</figcaption></figure><p>The beeswarm plot shows each feature’s impact on predictions. Each dot is a single prediction with colour representing the feature value (red = high, blue = low). Seasonality and day-of-week features dominate the top, followed by contextual factors like precipitation, temperature, and boost_count. Lag and rolling features contribute meaningfully but sit in the middle of the ranking.</p><p>I also generated a waterfall plot for the single worst prediction day. This is the day where the model’s prediction was furthest from the actual visitor count. The waterfall plot as shown below breaks down each feature’s contribution, showing which ones pushed the prediction up or down and by how much. This lets you diagnose why the model got it wrong: did it miss because something happened that day that no feature could capture (e.g., a surprise road closure), or because the feature values were an unusual combination it hadn’t seen during training (e.g., a sunny public holiday in winter)?</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/868/1*pdavteBSqN4JGASj5js7pQ.png" /><figcaption>Single prediction explanation for the worst prediction day</figcaption></figure><p><strong>Prediction Intervals</strong></p><p>A point prediction alone isn’t enough. Operations teams need to know the range of uncertainty. Therefore, I computed bootstrap prediction intervals using training residuals:</p><pre># Bootstrap prediction intervals<br>n_bootstrap = 500<br>bootstrap_preds = np.array([<br>    preds + np.random.choice(train_residuals, size=len(preds), replace=True)<br>    for _ in range(n_bootstrap)<br>])<br><br>lower_80 = np.percentile(bootstrap_preds, 10, axis=0)<br>upper_80 = np.percentile(bootstrap_preds, 90, axis=0)<br>lower_95 = np.percentile(bootstrap_preds, 2.5, axis=0)<br>upper_95 = np.percentile(bootstrap_preds, 97.5, axis=0)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*qSozWI70tws0Pj5aLtO7Gg.png" /><figcaption>Predictions with Bootstrap Intervals</figcaption></figure><p>The blue line tracks the model’s predictions against the actual visitor counts (black). The darker shaded band is the 80% interval; the lighter band is 95%. The actual values fall within these bands at close to the expected rate and this is what well-calibrated uncertainty looks like.</p><p>The intervals are well-calibrated whereby the 80% interval captures approximately 80% of actual values, and the 95% interval captures approximately 95%. This is the kind of result that builds trust with stakeholders: <em>“We predict 380 visitors, with 80% confidence it’ll be between 310 and 450.”</em></p><p>It’s worth noting that this bootstrap method assumes residuals are identically distributed across all conditions. In practice, as we saw in the error breakdown, weekends and public holidays have higher error than quiet weekdays. A more refined approach would compute condition-specific intervals, but for an initial deployment this provides a useful and honest range.</p><p><strong>6. Stakeholders Engagement</strong></p><p>I know from experience that metrics like MAE and RMSE don’t resonate with stakeholders who may be the museum directors or the day-to-day operations people in this context. What resonates is business impact. I framed the model’s performance in two ways.</p><p><strong>Traffic Tier Classification</strong></p><p>Instead of reporting raw numbers, the model classifies each day into a traffic tier:</p><pre>- Low: &lt; 250 visitors<br>- Medium: 250–450 visitors<br>- High: &gt; 450 visitors</pre><p>Each tier could easily map to a staffing plan and security/safety deployment.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*VyExuam35iqVl90KZq0J0Q.png" /><figcaption>Business Impact Analysis</figcaption></figure><p>Left: the confusion matrix shows how often the model’s predicted tier matches reality. The diagonal dominance means operations teams can trust the signal. Right: revenue prediction errors (at $25/ticket) cluster tightly around zero, with most days falling within plus or minus $2,000 AUD.</p><p>The model’s tier accuracy gives the operations team a reliable signal: if the model says “High traffic,” they can confidently deploy the larger team.</p><p><strong>Revenue Forecasting</strong></p><p>At say $25 AUD per ticket, though most days the museum is free, every visitor prediction translates directly to revenue. The model’s mean daily revenue error gives the finance team a concrete number to work with for budgeting and forecasting. For a data scientist, it’s worth celebrating MAE = 69.9 visitors. However, a museum director will celebrate when they hear: <strong><em>“we can predict tomorrow’s ticket revenue within plus or minus $1,750 AUD.”</em></strong></p><p>That wraps up Part 1 and I admit it was quite long. I promise to shorten Part 2. We’ve gone from the business problem through data generation, feature engineering, model training, evaluation, and business framing. The XGBoost model with 81 engineered features achieves a MAPE of 14.0%, production-ready for operations planning.</p><p>In <a href="https://hermanwandabwa.medium.com/stop-guessing-staffing-needs-how-id-predict-daily-museum-visitors-before-they-arrive-part-2-760498e03c4f">Part 2</a>, I’ll take this trained model and deploy it as a full-stack application: a FastAPI backend that serves predictions through a REST API, a React + Tailwind dashboard for the operations team, and a Supabase database for prediction logging. I’ll also cover deployment considerations and what I’d do differently with real data. This is something that I have never written about so I’m also very excited.</p><p>As always, all code is open-sourced <a href="https://github.com/wandabwa2004/museum-visitor-prediction/tree/dev/notebooks">here</a>. Please feel free to clone it and adapt it for your own venue. If you found this useful, a clap or comment goes a long way. You can find my other articles on my <a href="https://medium.com/@hermanwandabwa">profile</a>, and I’m always happy to connect via <a href="https://www.linkedin.com/in/wandabwaherman/">LinkedIn</a>.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=336ac0ca4f60" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-science-collective/stop-guessing-staffing-needs-predicting-daily-museum-visitors-before-they-arrive-part-1-336ac0ca4f60">Stop Guessing Staffing Needs: Predicting Daily Museum Visitors Before They Arrive (Part 1)</a> was originally published in <a href="https://medium.com/data-science-collective">Data Science Collective</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Finding the Perfect Spot: WiFi Analytics for “Premium” Space Advertising]]></title>
            <link>https://medium.com/data-science-collective/finding-the-perfect-spot-wifi-analytics-for-premium-space-advertising-b8470dd9181b?source=rss-58e995b6d0e3------2</link>
            <guid isPermaLink="false">https://medium.com/p/b8470dd9181b</guid>
            <category><![CDATA[ai]]></category>
            <category><![CDATA[analytics]]></category>
            <category><![CDATA[spatial-computing]]></category>
            <category><![CDATA[data-analysis]]></category>
            <category><![CDATA[advertising]]></category>
            <dc:creator><![CDATA[Herman Wandabwa]]></dc:creator>
            <pubDate>Fri, 02 Jan 2026 01:49:55 GMT</pubDate>
            <atom:updated>2026-01-04T23:12:44.785Z</atom:updated>
            <content:encoded><![CDATA[<h4>A graph-inspired approach to identifying high-value advertising locations using anonymous foot traffic data</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*8dxFyhr32Ve5w5K8ZrDVog.png" /><figcaption>Image generated by author</figcaption></figure><p><a href="https://hermanwandabwa.medium.com/finding-the-perfect-spot-wifi-analytics-for-premium-space-advertising-b8470dd9181b?source=friends_link&amp;sk=79f7744a1237aa3f5eb63826a2e2b82e"><strong>Stuck behind a firewall. Read the article for free here</strong></a></p><p>It’s my fourth New Year here in Australia, a day I cherish greatly. Normally, I take this time to relax and reflect on the events of the previous year. However, this time around, I found myself reflecting deeply on my former professional life. Before working in the finance space, I actually worked in spatial analytics for a company that predicted visitor numbers in museums and exhibitions based on various factors. It was one of those roles where you spend half your time explaining to curators (at least the product owners did) why ‘the spot near the bathrooms’ in a mall or museum isn’t actually premium real estate, despite the foot traffic. The other half was spent building models that some clients didn’t fully trust until they saw the results in action. It was fascinating work, and I genuinely enjoyed it.</p><p>Fast forward to today, and I realized I hadn’t touched spatial analytics in such a long time. I’ll still attribute this to being busy riding the Generative AI (Gen AI) wave, plus the usual modeling work that fills my days. I managed to write a few Gen AI articles and some predictive modeling pieces to justify the time spent on it. Many of them can be found <a href="https://medium.com/@hermanwandabwa">here</a>. Please read them for free and share your feedback. As always, all code accompanying each article is open-sourced, so feel free to re-implement it for your own problems.</p><p>For this reason, I decided to revisit spatial analytics once again, but with a twist. For those who may not be familiar with this use case, it’s simply the process of analyzing foot traffic patterns to identify the best locations for premium advertisements or exhibitions in malls and museums. For example, a jewelry shop or museum wanting to place a high-value exhibition needs to understand which locations actually deliver value. To accomplish this, they must understand not just how many people pass through a location, but how they move, where they linger, and which paths they take. Therefore, this will always be venue-specific and cannot be generalized. Additionally, they should have a way of measuring the success of their placement decisions. Don’t worry if this sounds complicated. I’ll break it down for you in the simplest terms possible with a relevant example.</p><h3>The Problem with “Premium” Space</h3><p>Here’s the uncomfortable truth: most “premium advertising space” decisions in malls/museums/exhibitions are still made the old-fashioned way. Someone senior walks the venue, points at a corner, and says, “This feels busy.” And to be fair, sometimes that intuition is correct. But it’s also the reason why premium placements become political. The leasing team thinks one spot is best, the tenants think another spot is best, and the marketing team is somewhere in the middle trying to justify the budget.</p><p>Now, if your venue offers free WiFi and (anonymously) tracks movement, you already have what many businesses dream of: behavioral data at scale. The trick is using it in a way that is statistically defensible, explainable to non-technical stakeholders, and respectful of privacy.</p><h3>Why This Approach is Different</h3><p>I’ll be honest: when I worked in spatial analytics, most of the analysis I saw treated locations as independent silos. Basically, all that was done was to count visitors, measure dwell time and call it a day. But that approach misses something fundamental. The real value of a location depends on:</p><ol><li>The <strong>network structure</strong> — the zones that connect to it</li><li><strong>Flow patterns</strong> — whether it’s a destination or a pass-through</li><li><strong>Movement dynamics </strong>— how people actually navigate the space</li><li><strong>Temporal patterns </strong>— when traffic peaks occur</li></ol><p>This is why I built my solution around a graph-inspired process rather than simple aggregation. Here, the venue e.g., museum or mall is modeled as a network of connected zones, and visitor movement is analyzed as paths through this network. This approach captures something that simple traffic counts miss: a location’s value isn’t just about how many people stop there, but also about being on high-traffic paths and serving as a hub.</p><p>Think of it this way: a corridor with low dwell time but massive throughflow might be more valuable than a quiet corner where people linger. Both matter, but for different reasons and to different advertisers.</p><h3>Dataset Structure &amp; The WiFi Pings Reality</h3><p>I came up with the below approach that would generally be used by venues with WiFi tracking capabilities. However, the implementation is not exhaustive and should <strong>NOT</strong> be adopted as is in production environments. Its subdivided as follows:</p><h4>a) Venue Graph Structure</h4><p>The first step is modeling the physical space as a graph where nodes represent zones (entrances, corridors, food courts, galleries etc.) and edges represent walkable connections between the zones. Therefore, each zone has coordinates (x, y) for spatial visualization. This is fundamentally different from treating each location independently.</p><p>Below is the <strong>exact code I used</strong> to generate a mock venue graph (zones + edges) and simulate WiFi pings.</p><pre>import numpy as np<br>import pandas as pd<br><br>def generate_mock_venue(seed=7):<br>    &quot;&quot;&quot;<br>    Creates:<br>      - zones (a simplified mall/museum graph)<br>      - wifi pings: device_id, timestamp, zone_id, x, y, rssi(received signal strength indicator)<br>      - optional POIs / candidate ad locations<br>    &quot;&quot;&quot;<br>    rng = np.random.default_rng(seed)<br><br>    # ----- Venue zones (nodes) -----<br>    # Think of each zone as a &quot;Wi-Fi localization cell&quot; or area (corridor section / gallery / food court).<br>    zones = pd.DataFrame([<br>        {&quot;zone_id&quot;: &quot;ENT_N&quot;, &quot;name&quot;: &quot;North Entrance&quot;, &quot;type&quot;: &quot;entrance&quot;, &quot;x&quot;: 10, &quot;y&quot;: 90},<br>        {&quot;zone_id&quot;: &quot;ENT_S&quot;, &quot;name&quot;: &quot;South Entrance&quot;, &quot;type&quot;: &quot;entrance&quot;, &quot;x&quot;: 10, &quot;y&quot;: 10},<br>        {&quot;zone_id&quot;: &quot;COR_1&quot;, &quot;name&quot;: &quot;Main Corridor 1&quot;, &quot;type&quot;: &quot;corridor&quot;, &quot;x&quot;: 30, &quot;y&quot;: 70},<br>        {&quot;zone_id&quot;: &quot;COR_2&quot;, &quot;name&quot;: &quot;Main Corridor 2&quot;, &quot;type&quot;: &quot;corridor&quot;, &quot;x&quot;: 50, &quot;y&quot;: 50},<br>        {&quot;zone_id&quot;: &quot;COR_3&quot;, &quot;name&quot;: &quot;Main Corridor 3&quot;, &quot;type&quot;: &quot;corridor&quot;, &quot;x&quot;: 70, &quot;y&quot;: 30},<br>        {&quot;zone_id&quot;: &quot;FOOD&quot;,  &quot;name&quot;: &quot;Food Court&quot;, &quot;type&quot;: &quot;food&quot;, &quot;x&quot;: 80, &quot;y&quot;: 80},<br>        {&quot;zone_id&quot;: &quot;ANCH_A&quot;,&quot;name&quot;: &quot;Anchor Store A&quot;, &quot;type&quot;: &quot;anchor&quot;, &quot;x&quot;: 90, &quot;y&quot;: 60},<br>        {&quot;zone_id&quot;: &quot;ANCH_B&quot;,&quot;name&quot;: &quot;Anchor Store B&quot;, &quot;type&quot;: &quot;anchor&quot;, &quot;x&quot;: 90, &quot;y&quot;: 20},<br>        {&quot;zone_id&quot;: &quot;GALL_1&quot;,&quot;name&quot;: &quot;Gallery 1&quot;, &quot;type&quot;: &quot;gallery&quot;, &quot;x&quot;: 40, &quot;y&quot;: 85},<br>        {&quot;zone_id&quot;: &quot;GALL_2&quot;,&quot;name&quot;: &quot;Gallery 2&quot;, &quot;type&quot;: &quot;gallery&quot;, &quot;x&quot;: 60, &quot;y&quot;: 85},<br>        {&quot;zone_id&quot;: &quot;REST&quot;,  &quot;name&quot;: &quot;Restrooms&quot;, &quot;type&quot;: &quot;utility&quot;, &quot;x&quot;: 65, &quot;y&quot;: 10},<br>        {&quot;zone_id&quot;: &quot;SEAT&quot;,  &quot;name&quot;: &quot;Seating / Atrium&quot;, &quot;type&quot;: &quot;amenity&quot;, &quot;x&quot;: 55, &quot;y&quot;: 55},<br>    ])<br><br> <br>    # zone adjacency edges - think of this as your &quot;walkable graph&quot; approximation. Remember this is hypothetical!<br>    edges = [<br>        (&quot;ENT_N&quot;, &quot;COR_1&quot;), (&quot;ENT_S&quot;, &quot;COR_3&quot;),<br>        (&quot;COR_1&quot;, &quot;COR_2&quot;), (&quot;COR_2&quot;, &quot;COR_3&quot;),<br>        (&quot;COR_2&quot;, &quot;SEAT&quot;),<br>        (&quot;COR_1&quot;, &quot;GALL_1&quot;), (&quot;GALL_1&quot;, &quot;GALL_2&quot;), (&quot;GALL_2&quot;, &quot;FOOD&quot;),<br>        (&quot;COR_2&quot;, &quot;ANCH_A&quot;), (&quot;COR_3&quot;, &quot;ANCH_B&quot;),<br>        (&quot;COR_3&quot;, &quot;REST&quot;),<br>        (&quot;FOOD&quot;, &quot;ANCH_A&quot;),<br>    ]<br><br>    # Convert to adjacency map for simulation<br>    adj = {}<br>    for a, b in edges:<br>        adj.setdefault(a, []).append(b)<br>        adj.setdefault(b, []).append(a)<br><br>    # ----- Simulate device journeys -----<br>    n_devices = 1500<br>    # sessions per device (some repeat visitors)<br>    sessions_per_device = rng.choice([1, 2, 3], size=n_devices, p=[0.72, 0.22, 0.06])<br><br>    # Campaign day timeline<br>    start = pd.Timestamp(&quot;2025-12-01 09:00:00&quot;)<br>    end   = pd.Timestamp(&quot;2025-12-01 21:00:00&quot;)<br>    total_minutes = int((end - start).total_seconds() // 60)<br><br>    # Entry zone bias: more people use south entrance<br>    entry_zones = [&quot;ENT_N&quot;, &quot;ENT_S&quot;]<br>    entry_p = [0.4, 0.6]<br><br>    # Zone &quot;stickiness&quot; (higher -&gt; more dwell)<br>    stickiness = {<br>        &quot;FOOD&quot;: 0.65, &quot;SEAT&quot;: 0.55, &quot;GALL_1&quot;: 0.45, &quot;GALL_2&quot;: 0.45,<br>        &quot;ANCH_A&quot;: 0.35, &quot;ANCH_B&quot;: 0.35,<br>        &quot;REST&quot;: 0.25,<br>        &quot;COR_1&quot;: 0.15, &quot;COR_2&quot;: 0.15, &quot;COR_3&quot;: 0.15,<br>        &quot;ENT_N&quot;: 0.10, &quot;ENT_S&quot;: 0.10<br>    }<br><br>    def simulate_session(device_id: str):<br>        # start time<br>        t0 = start + pd.Timedelta(minutes=int(rng.integers(0, total_minutes)))<br>        # session length (minutes)<br>        duration = int(rng.normal(75, 25))<br>        duration = int(np.clip(duration, 20, 180))<br>        t1 = t0 + pd.Timedelta(minutes=duration)<br><br>        # pick entry<br>        zone = rng.choice(entry_zones, p=entry_p)<br><br>        # simulate movement in 1-minute steps, later we can downsample to &quot;pings&quot;<br>        times = pd.date_range(t0, t1, freq=&quot;1min&quot;)<br>        zones_path = []<br>        for _ in times:<br>            zones_path.append(zone)<br>            # decide whether to stay<br>            if rng.random() &lt; stickiness.get(zone, 0.2):<br>                continue<br>            # else move to a neighbor<br>            nbrs = adj.get(zone, [])<br>            if not nbrs:<br>                continue<br>            # slight bias toward anchors/food places - chances are that stays are longer in such locations <br>            weights = []<br>            for n in nbrs:<br>                w = 1.0<br>                if n in [&quot;FOOD&quot;, &quot;ANCH_A&quot;, &quot;ANCH_B&quot;]:<br>                    w *= 1.35<br>                if n in [&quot;GALL_1&quot;, &quot;GALL_2&quot;]:<br>                    w *= 1.15<br>                weights.append(w)<br>            weights = np.array(weights) / np.sum(weights)<br>            zone = rng.choice(nbrs, p=weights)<br><br>        df = pd.DataFrame({&quot;device_id&quot;: device_id, &quot;timestamp&quot;: times, &quot;zone_id&quot;: zones_path})<br><br>        # add (x,y) with noise around zone centers<br>        zmap = zones.set_index(&quot;zone_id&quot;)[[&quot;x&quot;, &quot;y&quot;]].to_dict(&quot;index&quot;)<br>        df[&quot;x&quot;] = df[&quot;zone_id&quot;].map(lambda z: zmap[z][&quot;x&quot;] + rng.normal(0, 2.5))<br>        df[&quot;y&quot;] = df[&quot;zone_id&quot;].map(lambda z: zmap[z][&quot;y&quot;] + rng.normal(0, 2.5))<br><br>        # add RSSI proxy (more negative is weaker)<br>        # assume closer to AP in zone -&gt; stronger; we just simulate plausible values<br>        df[&quot;rssi&quot;] = -35 - (np.abs(rng.normal(0, 7, size=len(df))) + rng.uniform(0, 10, size=len(df)))<br>        return df<br><br>    all_sessions = []<br>    for i in range(n_devices):<br>        device_id = f&quot;d_{i:05d}&quot;<br>        for s in range(sessions_per_device[i]):<br>            all_sessions.append(simulate_session(device_id))<br><br>    traces = pd.concat(all_sessions, ignore_index=True)<br><br>    # ----- Downsample to &quot;wifi pings&quot; -----<br>    # Real Wi-Fi localization is irregular; simulate that by sampling each device ~ every 2-5 minutes.<br>    traces[&quot;minute&quot;] = traces[&quot;timestamp&quot;].dt.floor(&quot;min&quot;)<br>    # keep ping with probability dependent on random interval<br>    keep = np.zeros(len(traces), dtype=bool)<br>    for dev, g in traces.groupby(&quot;device_id&quot;, sort=False):<br>        step = int(rng.integers(2, 6))<br>        keep_idx = g.iloc[::step].index<br>        keep[keep_idx] = True<br>    pings = traces.loc[keep].drop(columns=[&quot;minute&quot;]).reset_index(drop=True)<br><br>    # Candidate ad/exhibit locations: pick zones that can host placements<br>    candidates = zones[zones[&quot;type&quot;].isin([&quot;corridor&quot;,&quot;food&quot;,&quot;amenity&quot;,&quot;gallery&quot;])].copy()<br>    candidates[&quot;is_candidate&quot;] = True<br><br>    return zones, pd.DataFrame(edges, columns=[&quot;from_zone&quot;,&quot;to_zone&quot;]), pings, candidates<br><br>zones, edges, pings, candidates = generate_mock_venue(seed=7)<br><br>print(&quot;zones:&quot;, zones.shape)<br>print(&quot;edges:&quot;, edges.shape)<br>print(&quot;pings:&quot;, pings.shape)<br>print(&quot;candidates:&quot;, candidates.shape)<br><br>pings.head()</pre><p>By encoding the graph structure, we can measure connectivity and flow, not just presence. This is crucial for understanding how zones relate to each other.</p><h4>b) WiFi Ping Data</h4><p>Real WiFi tracking data is irregular and sparse, making it quite messy to deal with. Devices don’t ping at consistent intervals. Connection quality varies. People also turn WiFi on and off.</p><p>In the code above, I simulated movement in 1-minute steps and then <strong>downsampled</strong> it into irregular pings (every ~2–5 minutes per device). That last part is important because it mimics what WiFi data is like under normal circumstances.</p><p>The “stickiness” is another crucial parameter. Food courts naturally have higher stickiness (people sit and eat). On the other hand, places like corridors have low stickiness (people pass through). This creates realistic heterogeneity in the data and yes, it’s based on patterns I observed in museums and exhibition spaces.</p><h4>c) Session Reconstruction from Irregular Pings</h4><p>As mentioned above, WiFi pings are irregular, so I needed to reconstruct them to coherent sessions. I used time gaps to identify when a device has left and returned. This is how actual WiFi analytics systems work.</p><p>Here’s the exact sessionisation function I used:</p><pre>def build_sessions(pings: pd.DataFrame, inactivity_gap_minutes=15):<br>    &quot;&quot;&quot;<br>    Convert irregular pings into sessions per device based on time gaps.<br>    &quot;&quot;&quot;<br>    df = pings.sort_values([&quot;device_id&quot;,&quot;timestamp&quot;]).copy()<br>    df[&quot;prev_ts&quot;] = df.groupby(&quot;device_id&quot;)[&quot;timestamp&quot;].shift(1)<br>    df[&quot;gap_min&quot;] = (df[&quot;timestamp&quot;] - df[&quot;prev_ts&quot;]).dt.total_seconds() / 60.0<br>    # new session if first ping or big inactivity gap<br>    df[&quot;new_sess&quot;] = (df[&quot;prev_ts&quot;].isna()) | (df[&quot;gap_min&quot;] &gt; inactivity_gap_minutes)<br>    df[&quot;session_id&quot;] = df.groupby(&quot;device_id&quot;)[&quot;new_sess&quot;].cumsum()<br>    df[&quot;session_key&quot;] = df[&quot;device_id&quot;] + &quot;_s&quot; + df[&quot;session_id&quot;].astype(str)<br>    return df.drop(columns=[&quot;prev_ts&quot;,&quot;gap_min&quot;,&quot;new_sess&quot;,&quot;session_id&quot;])</pre><p>You can’t assume continuous tracking and sessions need to be inferred from sparse observations. Therefore, I added a 15-minute inactivity threshold as a reasonable default. However, you should tune this value based on your venue’s typical visit duration.</p><h3>The Metrics That Actually Matter</h3><p>This is where this approach gets a bit interesting. I put together four categories of metrics that overall paint a complete picture of each location’s advertising value. These aren’t random choices but what advertisers actually care about when evaluating placement. To keep it clean, I calculate most zone metrics in one function, then calculate flow metrics separately (because that depends on transitions).</p><h4>1. Reach Metrics</h4><p>This is basic but <strong>essential</strong>. More eyeballs equals more potential impressions. This is the foundation of any advertising value calculation.</p><p>Here’s the exact metric function I used (reach + dwell + repeat + peakiness):</p><pre>def compute_zone_metrics(sessionized: pd.DataFrame, zones: pd.DataFrame, time_bucket=&quot;15min&quot;):<br>    &quot;&quot;&quot;<br>    Computes:<br>      - unique_reach: unique devices per zone<br>      - dwell_minutes: approx dwell per device-zone using time deltas<br>      - repeat_rate: how often devices return to the zone within a session<br>      - peakiness: how concentrated the traffic is in time (good for timed campaigns)<br>    &quot;&quot;&quot;<br>    df = sessionized.sort_values([&quot;session_key&quot;,&quot;timestamp&quot;]).copy()<br><br>    # time delta to next ping within session; cap to avoid huge dwell from sparse pings<br>    df[&quot;next_ts&quot;] = df.groupby(&quot;session_key&quot;)[&quot;timestamp&quot;].shift(-1)<br>    df[&quot;dt_min&quot;] = (df[&quot;next_ts&quot;] - df[&quot;timestamp&quot;]).dt.total_seconds() / 60.0<br>    df[&quot;dt_min&quot;] = df[&quot;dt_min&quot;].clip(lower=0, upper=8)  # cap dwell contribution<br><br>    # reach<br>    reach = df.groupby(&quot;zone_id&quot;)[&quot;device_id&quot;].nunique().rename(&quot;unique_reach&quot;)<br><br>    # dwell time per zone (sum of dt_min)<br>    dwell = df.groupby(&quot;zone_id&quot;)[&quot;dt_min&quot;].sum().rename(&quot;dwell_minutes&quot;)<br><br>    # average dwell per device (attention proxy)<br>    dwell_per_device = (df.groupby([&quot;zone_id&quot;,&quot;device_id&quot;])[&quot;dt_min&quot;].sum()<br>                          .groupby(&quot;zone_id&quot;).mean()<br>                          .rename(&quot;avg_dwell_per_device_min&quot;))<br><br>    # repeat exposure in-session: count visits to zone within a session<br>    visits = (df.groupby([&quot;session_key&quot;,&quot;zone_id&quot;]).size()<br>                .rename(&quot;pings_in_zone&quot;)<br>                .reset_index())<br>    repeat = visits.groupby(&quot;zone_id&quot;)[&quot;pings_in_zone&quot;].apply(lambda x: (x &gt; 1).mean()).rename(&quot;repeat_rate&quot;)<br><br>    # peakiness: bucket traffic in time and compute coefficient of variation<br>    df[&quot;bucket&quot;] = df[&quot;timestamp&quot;].dt.floor(time_bucket)<br>    bucket_counts = df.groupby([&quot;zone_id&quot;,&quot;bucket&quot;])[&quot;device_id&quot;].nunique().reset_index(name=&quot;u&quot;)<br>    peakiness = bucket_counts.groupby(&quot;zone_id&quot;)[&quot;u&quot;].agg([&quot;mean&quot;,&quot;std&quot;])<br>    peakiness[&quot;peakiness_cv&quot;] = (peakiness[&quot;std&quot;] / peakiness[&quot;mean&quot;]).replace([np.inf, -np.inf], np.nan).fillna(0)<br>    peakiness = peakiness[&quot;peakiness_cv&quot;]<br><br>    out = pd.concat([reach, dwell, dwell_per_device, repeat, peakiness], axis=1).fillna(0).reset_index()<br>    out = out.merge(zones[[&quot;zone_id&quot;,&quot;name&quot;,&quot;type&quot;]], on=&quot;zone_id&quot;, how=&quot;left&quot;)<br>    return out</pre><h4>2. Engagement Metrics (Dwell Time)</h4><p>The 8-minute cap is important. Sparse pings can create artificially long dwell times. Someone sitting in the food court for 45 minutes might only appear in the data twice, creating a huge apparent gap. The cap prevents these outliers from dominating the analysis. This is one of those things you learn the hard way when working with real WiFi data. As you can see in the code above, dwell is derived from the dt_min deltas and clipped to 8 minutes per ping interval.</p><h4>3. Flow Metrics</h4><p>This is where the graph-inspired approach pays off. I measured <em>throughflow </em>which is just how many transitions involve each zone. This captures something simple traffic counts is likely to miss.</p><p>Below is the exact flow function I used. It returns both the transition table T and per-zone flow metrics:</p><pre>def compute_flow_metrics(sessionized: pd.DataFrame, edges: pd.DataFrame):<br>    &quot;&quot;&quot;<br>    Build directed transitions between zones and compute:<br>      - throughflow: total transitions in + out for each zone<br>      - connectivity: number of distinct zones people come from and go to (observed)<br>    Returns:<br>      - T: transition table (from_zone, to_zone, n)<br>      - flow: per-zone flow metrics (zone_id, inflow, outflow, throughflow, connectivity)<br>    &quot;&quot;&quot;<br>    df = sessionized.sort_values([&quot;session_key&quot;, &quot;timestamp&quot;]).copy()<br>    df[&quot;prev_zone&quot;] = df.groupby(&quot;session_key&quot;)[&quot;zone_id&quot;].shift(1)<br><br>    trans = df.dropna(subset=[&quot;prev_zone&quot;]).copy()<br>    trans = trans[trans[&quot;prev_zone&quot;] != trans[&quot;zone_id&quot;]]<br><br>    # Transition counts<br>    T = (<br>        trans.groupby([&quot;prev_zone&quot;, &quot;zone_id&quot;]).size()<br>        .rename(&quot;n&quot;)<br>        .reset_index()<br>        .rename(columns={&quot;prev_zone&quot;: &quot;from_zone&quot;, &quot;zone_id&quot;: &quot;to_zone&quot;})<br>    )<br><br>    # Inflow/outflow/throughflow<br>    outflow = T.groupby(&quot;from_zone&quot;)[&quot;n&quot;].sum().reset_index().rename(<br>        columns={&quot;from_zone&quot;: &quot;zone_id&quot;, &quot;n&quot;: &quot;outflow&quot;}<br>    )<br>    inflow = T.groupby(&quot;to_zone&quot;)[&quot;n&quot;].sum().reset_index().rename(<br>        columns={&quot;to_zone&quot;: &quot;zone_id&quot;, &quot;n&quot;: &quot;inflow&quot;}<br>    )<br><br>    flow = pd.merge(inflow, outflow, on=&quot;zone_id&quot;, how=&quot;outer&quot;).fillna(0)<br>    flow[&quot;throughflow&quot;] = flow[&quot;inflow&quot;] + flow[&quot;outflow&quot;]<br><br>    # Connectivity proxy: distinct previous + distinct next (observed)<br>    distinct_next = T.groupby(&quot;from_zone&quot;)[&quot;to_zone&quot;].nunique().reset_index().rename(<br>        columns={&quot;from_zone&quot;: &quot;zone_id&quot;, &quot;to_zone&quot;: &quot;distinct_next_zones&quot;}<br>    )<br>    distinct_prev = T.groupby(&quot;to_zone&quot;)[&quot;from_zone&quot;].nunique().reset_index().rename(<br>        columns={&quot;to_zone&quot;: &quot;zone_id&quot;, &quot;from_zone&quot;: &quot;distinct_prev_zones&quot;}<br>    )<br><br>    conn = pd.merge(distinct_prev, distinct_next, on=&quot;zone_id&quot;, how=&quot;outer&quot;).fillna(0)<br>    conn[&quot;connectivity&quot;] = conn[&quot;distinct_prev_zones&quot;] + conn[&quot;distinct_next_zones&quot;]<br><br>    flow = pd.merge(flow, conn[[&quot;zone_id&quot;, &quot;connectivity&quot;]], on=&quot;zone_id&quot;, how=&quot;left&quot;).fillna(0)<br><br>    return T, flow</pre><p><strong><em>Why this matters:</em></strong> A corridor might have <em>low dwell time</em> but massive <em>throughflow</em>. For instance, if 5,000 people walk through it daily, that’s 5,000 advertising impressions, even if each person only spends 30 seconds there. This is the difference between a destination and a pathway, and both have value.</p><h4>4. Repeat Exposure</h4><p>High repeat rate means it’s a hub people return to, or it’s unavoidable (like a main corridor). This is good for brand reinforcement. If someone passes your ad three times during their visit, that’s more valuable than one pass as they are likely to have an idea of what is being advertised. Repeat rate is also computed in compute_zone_metrics() above via “pings per session per zone”.</p><h4>5. Temporal Peakiness</h4><p>Coefficient of variation tells you if traffic is concentrated (high CV) or spread out (low CV). High peakiness is great for timed campaigns or events. Low peakiness means consistent exposure throughout the day. This is also included in compute_zone_metrics() above via peakiness_cv.</p><h4>The Scoring Formula</h4><p>With all these metrics, I combine them into a single “<strong>premium score</strong>” (for lack of a better term) using weighted aggregation. This is where art meets science, because the weights reflect what different stakeholders care about.</p><p>Before scoring, I first combine everything into one dataset (sessionize → compute metrics → merge flow). Here’s exactly how I do that:</p><pre>sessionized = build_sessions(pings, inactivity_gap_minutes=15)<br>zone_metrics = compute_zone_metrics(sessionized, zones)<br>T, flow_metrics = compute_flow_metrics(sessionized, edges)<br><br>zone_metrics = zone_metrics.merge(flow_metrics[[&quot;zone_id&quot;,&quot;throughflow&quot;,&quot;connectivity&quot;]], on=&quot;zone_id&quot;, how=&quot;left&quot;).fillna(0)<br><br>zone_metrics.sort_values(&quot;unique_reach&quot;, ascending=False).head(10)</pre><p>Now for the scoring.</p><h4>Min-Max Normalization</h4><p>Without normalization, raw numbers would completely dominate dwell times (in minutes). Normalization brings everything to the same scale [0, 1], so the weights actually mean something.</p><p>Here’s the exact normalization function I used:</p><pre>def minmax(s: pd.Series):<br>    if s.max() == s.min():<br>        return pd.Series(np.zeros(len(s)), index=s.index)<br>    return (s - s.min()) / (s.max() - s.min())</pre><h4>Weighted Scoring + Ranking</h4><p>I used this ranking function that includes the justification string so you can defend the output with stakeholders.</p><pre>def rank_candidates(zone_metrics: pd.DataFrame, candidates: pd.DataFrame,<br>                    weights=None):<br>    if weights is None:<br>        weights = {<br>            &quot;unique_reach&quot;: 0.35,<br>            &quot;avg_dwell_per_device_min&quot;: 0.30,<br>            &quot;throughflow&quot;: 0.25,<br>            &quot;repeat_rate&quot;: 0.10<br>        }<br><br>    df = zone_metrics.merge(candidates[[&quot;zone_id&quot;,&quot;is_candidate&quot;]], on=&quot;zone_id&quot;, how=&quot;left&quot;)<br>    df = df[df[&quot;is_candidate&quot;] == True].copy()<br><br>    # Normalize each component so scales don&#39;t dominate<br>    for k in weights.keys():<br>        df[f&quot;{k}_norm&quot;] = minmax(df[k].astype(float))<br><br>    df[&quot;premium_score&quot;] = 0.0<br>    for k, w in weights.items():<br>        df[&quot;premium_score&quot;] += w * df[f&quot;{k}_norm&quot;]<br><br>    # Justification strings for each zone (top drivers)<br>    def explain_row(r):<br>        parts = []<br>        parts.append(f&quot;Reach={int(r[&#39;unique_reach&#39;])} devices&quot;)<br>        parts.append(f&quot;Avg dwell={r[&#39;avg_dwell_per_device_min&#39;]:.2f} min/device&quot;)<br>        parts.append(f&quot;Flow={int(r[&#39;throughflow&#39;])} transitions&quot;)<br>        parts.append(f&quot;Repeat={r[&#39;repeat_rate&#39;]:.2%}&quot;)<br>        return &quot; | &quot;.join(parts)<br><br>    df[&quot;justification&quot;] = df.apply(explain_row, axis=1)<br>    return df.sort_values(&quot;premium_score&quot;, ascending=False).reset_index(drop=True), weights<br><br>ranked, used_weights = rank_candidates(zone_metrics, candidates)<br>ranked[[&quot;zone_id&quot;,&quot;name&quot;,&quot;type&quot;,&quot;premium_score&quot;,&quot;justification&quot;]].head(10)</pre><p>Why these weights?</p><ul><li><strong>Reach (35%)</strong>: More people equals more value. This is the foundation.</li><li><strong>Dwell time (30%)</strong>: Longer exposure equals better ad absorption. Quality matters.</li><li><strong>Throughflow (25%)</strong>: Being on high-traffic paths matters even without long dwell.</li><li><strong>Repeat (10%)</strong>: Nice bonus for reinforcement, but secondary to the above.</li></ul><p>These weights are not arbitrary. They reflect what advertisers actually care about: impressions (reach), attention (dwell), and position (flow). You can adjust them based on your specific use case. For a quick-impact ad campaign (think movie posters), you might increase throughflow weight. For detailed product displays, increase dwell time weight.</p><h4>Results and Justifications</h4><p>Running this on the simulated dataset, the top locations emerge with clear justifications.</p><p>Typical outputs are likely to look like this:</p><pre>zone_id  name              premium_score  justification<br>COR_2    Main Corridor 2   0.89          Reach=1247 devices | Avg dwell=2.30 min/device | Flow=8234 transitions | Repeat=45%<br>FOOD     Food Court        0.85          Reach=1089 devices | Avg dwell=5.80 min/device | Flow=6421 transitions | Repeat=38%<br>SEAT     Seating/Atrium    0.82          Reach=1156 devices | Avg dwell=4.20 min/device | Flow=5892 transitions | Repeat=52%<br>COR_1    Main Corridor 1   0.79          Reach=1198 devices | Avg dwell=1.80 min/device | Flow=7456 transitions | Repeat=41%<br>GALL_1   Gallery 1         0.74          Reach=892 devices | Avg dwell=3.90 min/device | Flow=4231 transitions | Repeat=29%</pre><p><strong>What this tells us:</strong></p><ol><li>Main corridors dominate despite low dwell time. Throughflow is massive. These are your “billboard on the highway” locations.</li><li>Food Court <strong>balances reach and engagement</strong>. People linger here. This is perfect for detailed product displays or interactive kiosks.</li><li>Atrium/Seating is a hub with good<strong> repeat exposure</strong>. People return to rest, wait for friends, etc.</li><li>Galleries have engaged audiences but lower overall traffic. They could be great for niche products or high-value exhibitions.</li></ol><p>This is the kind of insight you can bring to a leasing meeting. Instead of “this corner feels busy,” you can say something like <em>“Main Corridor 2 has 8,234 daily transitions with 1,247 unique visitors and 45% repeat exposure within sessions.”</em></p><h4>Visualization: Making It Tangible</h4><p>One thing I learned from my museum analytics days is that business stakeholders love spatial visualizations. Numbers are abstract. Maps are the real deal to them.</p><p>You can adopt this simple function for the plots:</p><pre>import matplotlib.pyplot as plt<br><br>def plot_zone_scores(zones, ranked, top_n=8):<br>    df = zones.merge(ranked[[&quot;zone_id&quot;,&quot;premium_score&quot;]], on=&quot;zone_id&quot;, how=&quot;left&quot;).fillna(0)<br><br>    plt.figure(figsize=(8,6))<br>    plt.scatter(df[&quot;x&quot;], df[&quot;y&quot;], s=200, alpha=0.6)<br>    for _, r in df.iterrows():<br>        plt.text(r[&quot;x&quot;]+0.8, r[&quot;y&quot;]+0.8, r[&quot;zone_id&quot;], fontsize=9)<br><br>    # highlight top<br>    top = ranked.head(top_n).merge(zones, on=&quot;zone_id&quot;, how=&quot;left&quot;)<br>    plt.scatter(top[&quot;x&quot;], top[&quot;y&quot;], s=600, alpha=0.35)<br><br>    plt.title(&quot;Venue zones + highlighted top premium placement candidates&quot;)<br>    plt.xlabel(&quot;x&quot;)<br>    plt.ylabel(&quot;y&quot;)<br>    plt.show()<br><br>plot_zone_scores(zones, ranked, top_n=6)</pre><p>The below spatial plot immediately shows which zones are physically clustered and which are isolated.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/700/1*jmdvCqvs5i3PZXKV8V4xQg.png" /><figcaption>Plot of zones + premium spaces</figcaption></figure><p>Combined with the scores, it helps identify strategic placement patterns. For example, you might notice that all your top zones are in one wing of the mall. This is useful information for planning advertising campaigns or tenant mix.</p><h4>Privacy and Ethics</h4><p>Before we go further, let me be very clear about privacy. This is not optional. Do it wrong and you’ll have lawsuits, not insights.</p><p><strong>What you must do:</strong></p><ul><li>Only track anonymous device IDs (MAC addresses, hashed)</li><li>Aggregate data before analysis (no individual tracking)</li><li>Clear signage about WiFi analytics</li><li>Compliance with GDPR/local privacy laws</li><li>Data retention policies (delete raw pings after aggregation)</li></ul><p><strong>What you MUST NEVER do:</strong></p><ul><li>Track specific individuals</li><li>Link WiFi data to payment data or personal info</li><li>Share raw movement data with third parties</li><li>Use data for purposes beyond what was disclosed</li></ul><p><strong>The general rule:</strong> <em>if you can’t explain your data collection to a concerned parent or privacy advocate, you shouldn’t be doing it.</em></p><h3>Future Directions</h3><p>This approach is solid, but there’s always room for improvement. Here are extensions I’d consider in future iterations:</p><h4>1. Demographic Proxies</h4><p>Device types can serve as demographic indicators. For example:</p><ul><li>&gt; iPhone 15 Pro → likely affluent (I could be biased on this).</li><li>Older Android models → budget-conscious</li><li>Tablets → families with children</li></ul><p>You can’t know for certain, but probabilistic inference adds value.</p><h4>2. Path Analysis</h4><p>Instead of just zone-level metrics, analyze common journeys:</p><ul><li>Entry → Corridor → Food Court → Exit</li><li>Entry → Gallery 1 → Gallery 2 → Exit</li></ul><p>This reveals which advertising sequences work (billboard in corridor → detailed display in gallery).</p><h4>3. Time-of-Day Patterns</h4><p>Different hours have different audiences:</p><ul><li>Morning: commuters, coffee runs</li><li>Midday: lunch crowd, families</li><li>Evening: diners, entertainment seekers</li></ul><p>Adjust placement recommendations by time of day.</p><h4>4. A/B Testing Zones</h4><p>Actually place ads and measure conversion:</p><ul><li>QR code scans</li><li>App downloads</li><li>Coupon redemptions</li></ul><p>Feed this back into your model. Real performance beats all predictions.</p><h4><strong>5. Causal Modeling</strong></h4><p>What if an ad was placed here? This is uplift thinking applied to spatial analytics. Estimate the causal effect of placement on engagement, not just correlation. May be its time you revisit my earlier uplift modelling article <a href="https://medium.com/data-science-collective/stop-guessing-what-works-how-id-use-uplift-modelling-to-target-churn-interventions-without-fancy-3e4230de2575">here</a>.</p><h3>Conclusion</h3><p>That’s it for now. I’ve taken you through the entire process of building a graph-inspired WiFi analytics system for identifying premium advertising locations in any space. We’ve moved from simple traffic counting to sophisticated network analysis, capturing not just where people are, but how they move, why it matters, and what it’s worth.</p><p><strong>The key insight: </strong>location value is about more than traffic volume. It’s about the intersection of reach, engagement, flow, and connectivity. A low-traffic zone on a critical path can be more valuable than a high-traffic dead end. Both matter, but for different reasons and to different advertisers.</p><p>I hope this walkthrough was also useful. Don’t forget to follow me, clap for me, and leave a comment. Let me know if you’d like me to extend this to causal analysis (what-if scenarios for different layouts) or multi-venue comparisons. If you want to check out my other articles, you can find them on my <a href="https://medium.com/@hermanwandabwa">profile</a>. I’m also happy to connect via <a href="https://www.linkedin.com/in/wandabwaherman/">LinkedIn</a>.</p><p><strong>Technical Note:</strong> All data is simulated for educational purposes. Real deployments require proper WiFi infrastructure, privacy compliance, and ground-truth validation.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=b8470dd9181b" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-science-collective/finding-the-perfect-spot-wifi-analytics-for-premium-space-advertising-b8470dd9181b">Finding the Perfect Spot: WiFi Analytics for “Premium” Space Advertising</a> was originally published in <a href="https://medium.com/data-science-collective">Data Science Collective</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Stop Guessing What Works — How I’d Use Uplift Modelling to Target Churn Interventions without fancy…]]></title>
            <link>https://medium.com/data-science-collective/stop-guessing-what-works-how-id-use-uplift-modelling-to-target-churn-interventions-without-fancy-3e4230de2575?source=rss-58e995b6d0e3------2</link>
            <guid isPermaLink="false">https://medium.com/p/3e4230de2575</guid>
            <category><![CDATA[uplift-analysis]]></category>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[customer-retention]]></category>
            <category><![CDATA[churn-prediction]]></category>
            <category><![CDATA[artificial-intelligence]]></category>
            <dc:creator><![CDATA[Herman Wandabwa]]></dc:creator>
            <pubDate>Wed, 17 Sep 2025 11:20:09 GMT</pubDate>
            <atom:updated>2025-09-17T21:16:55.873Z</atom:updated>
            <content:encoded><![CDATA[<h3>Stop Guessing What Works — How I’d Use Uplift Modelling to Target Churn Interventions without fancy AI</h3><h4><strong>From predicting who might leave to targeting who you can actually save. Using uplift to focus the budget on “persuadables,” not “sure things” or “lost causes.”</strong></h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*6fiUxKbdEg1deXHL" /><figcaption>Image generated by author</figcaption></figure><p>In my <a href="https://medium.com/data-science-collective/stop-guessing-who-will-leave-how-i-would-predict-customer-churn-before-it-happens-300d34cbd13c">previous article</a>, I explored <strong>propensity-to-churn modelling</strong> in a simulated banking setting, a small love letter to classical Machine Learning (ML) after a year of GenAI. The piece is long (I know 😅), but worth your time, so I recommend reading it first. For the “stubborn ones,” here’s the gist: I showed how a bank could simulate customer data, train logistic regression and gradient boosting models, and rank customers by churn risk. If you’d like the full story, you can read it <a href="https://medium.com/data-science-collective/stop-guessing-who-will-leave-how-i-would-predict-customer-churn-before-it-happens-300d34cbd13c">here</a>. It’s a free link and no subscription required</p><p>A friend (a lawyer, working hard to understand AI) told me to simplify my writing for a wider reach. Fair point. So in this article, I’ll keep things straightforward and simple, as I’m genuinely trying to make technical concepts accessible to everyone. Back to customer churn.</p><p><strong>Prediction is normally just half the story.</strong> Yes, we can estimate who is likely to leave. But the real product question is:</p><p><em>👉 “If I intervene, will it actually make a difference?”</em></p><p>That’s where an interesting concept called <strong>uplift modelling/analysis</strong> comes in, moving us from <em>predictions</em> to <em>actions</em>.</p><h4>What is Uplift Analysis?</h4><p>Uplift analysis is a measure of the <strong>causal impact</strong> of an intervention. Basically, instead of asking, <strong>“Who will churn?” </strong>we ask, <strong>“Who can we save <em>because</em> we intervened?”</strong></p><p><strong>Why uplift analysis matters: </strong>Hypothetically, if your churn model flags 100 “high-risk” customers and you blast all of them with discounts, there is a very high chance that you’ll waste budget on many who would <strong>stay anyway. </strong>In addition, there is also a very high chance that<strong> </strong>you’ll still miss those who <strong>only</strong> stay if you actually help them.</p><p>In uplift and with specificity to the banking scenario in my previous article, we think of these classes of customers in four plain-English segments:</p><ul><li><strong>Persuadables </strong>who<strong> </strong>will churn (leave the bank in our case) if we do nothing but stay if we intervene. <em>(Sweet spot.)</em></li><li><strong>Sure Things </strong>will stay regardless. <em>(No need to spend anything here.)</em></li><li><strong>Lost Causes </strong>will<strong> </strong>leave regardless of the treatment. <em>(Acknowledge and move on.)</em></li><li><strong>Sleeping Dogs</strong>—would have stayed, but intervention pushes them away. I sometimes feel like I’m a “<em>Sleeping Dog”</em> with a number of my utility providers. I’m always waiting for the trigger to drop them. Basically, they contact me, and I’ll “remember” the bad experience I had with them and, of course, leave. Seems like they know it, as no “annoying” deals have come through just yet.</li></ul><p>This framing prevents “blanket offers” and actually focuses the budget where it actually changes outcomes.</p><h4>Why Uplift Makes More Sense Than Just Propensity</h4><p>Typically, a propensity score would say something like, “Customer<strong> A has an 80% chance of churning (leaving).” </strong>I acknowledge that this is helpful but also incomplete. It usually doesn’t say whether an <strong>SMS, fee waiver, email, or phone call</strong> is likely to <strong>change that outcome</strong>.</p><p>Therefore, uplift models capture the difference between:</p><ol><li><strong>Churn probability if treated (</strong>sent<strong> </strong>an SMS, emailed or apply a loyalty credit) and</li><li><strong>Churn probability if not treated </strong>(nothing is done)</li></ol><p>That difference is the <strong>uplift score</strong>. The conclusion would be something like:</p><ul><li><strong>Positive uplift</strong> → treatment helps (good to target).</li><li><strong>Negative uplift</strong> → treatment hurts (avoid!).</li></ul><h3>Reusing the Churn Dataset (and of course Extending It Slightly)</h3><p>In the previous churn <a href="https://medium.com/data-science-collective/stop-guessing-who-will-leave-how-i-would-predict-customer-churn-before-it-happens-300d34cbd13c">article</a>, I simulated a <a href="https://github.com/wandabwa2004/churn_uplift/blob/main/data/equity_bank_churn_dataset.csv">realistic-ish retail-bank dataset</a> with features covering demographics, digital engagement, balances, loans, etc. To keep things simple here, I’ll extend the same CSV by adding a <strong>treatment</strong> flag that represents a retention offer (e.g., fee waiver, loyalty bonus, branch call).</p><blockquote>In real life, you’d randomize who gets the offer during a small pilot. Randomization makes uplift measurement honest and simple.</blockquote><p>I’ll add a few essential code snippets here for you to run and replicate the additional columns and uplift computation. I’ll keep the code minimal and readable for everyone. As usual, the full notebook with more extensive steps is on my <a href="https://github.com/wandabwa2004/churn_uplift/tree/main">GitHub</a>.</p><h4>1. Addition of a Treatment Flag</h4><p>Assuming the dataset is the same as the one generated when the main.py code is run <a href="https://github.com/wandabwa2004/churn_in_finance/tree/master/data">here</a>, then we’ll simulate a 50/50 randomized offer and a mild “it helps” effect for illustration purposes. This will be different based on different business requirements in real-life scenarios. Therefore, do not use this approach on real production data.</p><pre>import numpy as np<br>import pandas as pd<br><br>df = pd.read_csv(&quot;equity_bank_churn_dataset.csv&quot;)<br>np.random.seed(42)<br><br># 1) Randomized treatment - remember this is for demonstration purposes <br>df[&quot;treatment&quot;] = np.random.binomial(1, 0.5, len(df))<br><br># 2) Simple treated-world probability (reduce churn prob by ~10 points if treated)<br>df[&quot;churn_probability&quot;] = pd.to_numeric(df[&quot;churn_probability&quot;], errors=&quot;coerce&quot;).fillna(0.0)<br>TREATMENT_EFFECT = -0.10  # negative because churn is &quot;bad&quot;<br>df[&quot;adj_churn_prob&quot;] = (df[&quot;churn_probability&quot;] + TREATMENT_EFFECT * df[&quot;treatment&quot;]).clip(0, 1)<br><br># 3) Synthetic treated outcome - for learning purposes; in reality you observe only one outcome)<br>df[&quot;churned_treated&quot;] = np.random.binomial(1, df[&quot;adj_churn_prob&quot;]).astype(int)<br><br># Observed outcome for this tutorial:<br># - control customers keep their original &#39;churned&#39; label<br># - treated customers use the synthetic treated outcome<br>label_col = &quot;churned&quot; if &quot;churned&quot; in df.columns else &quot;churn&quot;<br>df[&quot;outcome&quot;] = np.where(df[&quot;treatment&quot;]==1, df[&quot;churned_treated&quot;], df[label_col]).astype(int)</pre><h4>⚠ Please don’t fabricate outcomes in production settings. Run a small <strong>randomized</strong> pilot so your treated vs. control outcomes are real.</h4><h4>2. Two Small Models (Classical + Explainable)</h4><p>Computing the uplift involves training <strong>one</strong> model to predict churn for <strong>treated</strong> customers and <strong>another</strong> for <strong>control</strong> customers. Then, a prediction of both scenarios is made for everyone, and the <strong>difference </strong>is the uplift. As simple as it sounds, that’s the whole idea behind uplift analysis. The code snippet below helps illustrate this:</p><pre>from sklearn.compose import ColumnTransformer<br>from sklearn.impute import SimpleImputer<br>from sklearn.preprocessing import OneHotEncoder, StandardScaler<br>from sklearn.pipeline import Pipeline<br>from sklearn.linear_model import LogisticRegression<br><br># Keep it simple: auto-detect features from theCSV<br>DROP = {&quot;customer_id&quot;,&quot;account_open_date&quot;,&quot;churn_date&quot;,<br>        &quot;churn_probability&quot;,&quot;churned&quot;,&quot;churn&quot;,&quot;churn_flag&quot;,<br>        &quot;treatment&quot;,&quot;adj_churn_prob&quot;,&quot;churned_treated&quot;,&quot;outcome&quot;}<br><br>cat_cols, num_cols = [], []<br>for c in df.columns:<br>    if c in DROP: <br>        continue<br>    (cat_cols if df[c].dtype==&quot;object&quot; else num_cols).append(c)<br><br>pre = ColumnTransformer([<br>    (&quot;cat&quot;, Pipeline([(&quot;imp&quot;, SimpleImputer(strategy=&quot;most_frequent&quot;)),<br>                      (&quot;ohe&quot;, OneHotEncoder(handle_unknown=&quot;ignore&quot;, sparse_output=False))]), cat_cols),<br>    (&quot;num&quot;, Pipeline([(&quot;imp&quot;, SimpleImputer(strategy=&quot;median&quot;)),<br>                      (&quot;sc&quot;, StandardScaler())]), num_cols),<br>])<br><br># Split into treated/control<br>df_treat = df[df[&quot;treatment&quot;]==1].copy()<br>df_ctrl  = df[df[&quot;treatment&quot;]==0].copy()<br><br>X_treat, y_treat = df_treat[cat_cols+num_cols], df_treat[&quot;outcome&quot;]<br>X_ctrl,  y_ctrl  = df_ctrl[cat_cols+num_cols],  df_ctrl[&quot;outcome&quot;]<br><br>m_treat = Pipeline([(&quot;pre&quot;, pre), (&quot;clf&quot;, LogisticRegression(max_iter=1000, class_weight=&quot;balanced&quot;))]).fit(X_treat, y_treat)<br>m_ctrl  = Pipeline([(&quot;pre&quot;, pre), (&quot;clf&quot;, LogisticRegression(max_iter=1000, class_weight=&quot;balanced&quot;))]).fit(X_ctrl,  y_ctrl)<br><br># Predict both worlds for everyone<br>X_all = df[cat_cols+num_cols]<br>df[&quot;p_treat&quot;] = m_treat.predict_proba(X_all)[:,1]   # P(churn | treat)<br>df[&quot;p_ctrl&quot;]  = m_ctrl.predict_proba(X_all)[:,1]    # P(churn | no treat)<br>df[&quot;uplift&quot;]  = df[&quot;p_ctrl&quot;] - df[&quot;p_treat&quot;]        # positive =&gt; good to treat</pre><p>This is the core: <strong>uplift = p(no-treat) − p(treat)</strong>.</p><p>Higher uplift → bigger expected benefit from intervening.</p><h4>3. Turn Scores Into Actions (The Four Segments)</h4><p>We’ll bucket customers with <strong>simple, explainable</strong> rules that can, of course, be tweaked, as well as thresholds changed to match your budget and risk appetite.</p><pre># Simple quantile thresholds <br>up_pos = df[&quot;uplift&quot;].quantile(0.70)   # clearly positive uplift<br>up_neg = df[&quot;uplift&quot;].quantile(0.10)   # clearly negative uplift<br>r_hi   = df[&quot;p_ctrl&quot;].quantile(0.60)   # genuinely at risk without treatment<br>r_lo   = df[&quot;p_ctrl&quot;].quantile(0.20)   # pretty safe already<br><br>def segment(u, r):<br>    if u &lt;= up_neg: return &quot;Sleeping Dog&quot;          # treatment backfires<br>    if (u &gt;= up_pos) and (r &gt;= r_hi): return &quot;Persuadable&quot;<br>    if (r &lt;= r_lo) and (up_neg &lt; u &lt; up_pos): return &quot;Sure Thing&quot;<br>    if (r &gt;= r_hi) and (up_neg &lt; u &lt; up_pos): return &quot;Lost Cause&quot;<br>    return &quot;Gray Zone&quot; # pilot group that you can learn from <br><br>df[&quot;segment&quot;] = [segment(u, r) for u, r in zip(df[&quot;uplift&quot;], df[&quot;p_ctrl&quot;])]<br><br>df[&quot;action&quot;] = df[&quot;segment&quot;].map({<br>    &quot;Persuadable&quot;: &quot;Target (intervene)&quot;,<br>    &quot;Sure Thing&quot;: &quot;Do not treat&quot;,<br>    &quot;Lost Cause&quot;: &quot;Do not treat (low ROI)&quot;,<br>    &quot;Sleeping Dog&quot;: &quot;Avoid (may backfire)&quot;,<br>    &quot;Gray Zone&quot;: &quot;Test small / learn&quot;<br>})</pre><p>Reading this as a product owner, and I know some of you are:</p><ul><li><strong>Treat</strong> persuadables first (money well spent).</li><li><strong>Skip</strong> Sure Things (they stay anyway).</li><li><strong>Skip</strong> Lost Causes (hard to move).</li><li><strong>Exclude</strong> Sleeping Dogs (don’t poke the bear!).</li><li><strong>Gray Zone</strong>—run tiny tests and learn from their behaviour. You can then decide what to do with them.</li></ul><h4>4. “Does This Pay?”</h4><p>This is a fundamental question that the business is likely to ask you about. I’d answer it this way. If the average <strong>Customer Lifetime Value (CLV)</strong> is CLV and the treatment <strong>cost</strong> per customer is COST, an expected profit can be formulated as follows:</p><p><strong>Profitᵢ ≈ upliftᵢ × CLV − COST</strong></p><h3>What this actually means:</h3><ul><li>upliftᵢ​: the <strong>estimated drop in churn probability</strong> for customer <em>i</em> <strong>if treated</strong> (e.g., P<em>_ctrl</em> − P<em>_treat</em>​). This will be a number between 0 and 1.</li><li><strong>CLV</strong>: is the <strong>net customer lifetime value</strong> (in money) you gain by keeping a customer. This could be defined differently depending on the business case.</li><li><strong>COST</strong>: the <strong>per-customer cost</strong> of the intervention (SMS, call, discount, emails, etc.)</li></ul><p>Because upliftᵢ​ is a probability, <strong>upliftᵢ </strong>× <strong>CLV </strong>is the <strong>expected monetary value</strong> of saving that customer. The product minus the <strong>COST </strong>gives the <strong>expected profit </strong>of keeping the customer.</p><ul><li>If Profitᵢ &gt; 0, then it&#39;s worth treating.</li><li>If Profitᵢ&lt;0, then there is no need of keeping them to be treated.</li></ul><p>I’ll use an example here to illustrate this:</p><p>Suppose the Upliftᵢ = 0.25 and the CLV = $250 and the cost is $5, then:</p><p><strong>Profitᵢ ≈ (0.25 * 250) – 5 = 57.5</strong></p><p>So, on average, treating this customer is likely to yield <strong>$57.5</strong> in profit. I’ve thresholded treatment for any customer with any profit. Depending on your budget, you can change this and, for example, target the ones that are likely to yield higher profits.</p><p>The below code illustrates this. Simply pick the top customers by this number until you hit your budget.</p><pre>CLV, COST, BUDGET = 250.0, 5.0, 5000  # this can be changed to fit your context/goals<br><br>df[&quot;expected_profit&quot;] = df[&quot;uplift&quot;] * CLV - COST<br>campaign = df.sort_values(&quot;expected_profit&quot;, ascending=False).head(BUDGET)<br><br>print(&quot;Selected:&quot;, len(campaign), &quot;Expected total profit ~&quot;, int(campaign[&quot;expected_profit&quot;].sum()))<br>campaign[[&quot;customer_id&quot;,&quot;uplift&quot;,&quot;p_ctrl&quot;,&quot;p_treat&quot;,&quot;segment&quot;,&quot;action&quot;,&quot;expected_profit&quot;]].head(10)</pre><p>This to a large extent makes the decision “honest”<strong> </strong>in that<strong> </strong>treatment is offered where the <strong>incremental</strong> benefit outweighs the cost.</p><h3>Practical Notes (and please read)</h3><ul><li><strong>Randomize</strong> where you can. Even a small A/B pilot (e.g., 10% holdout) will make your uplift estimates far more trustworthy.</li><li><strong>Be careful with non-random treatment.</strong> If teams hand-pick who to call, then your data is biased by all definitions. In this case, consider causal ML or run a clean experiment.</li><li><strong>Define &quot;churn&quot; very clearly, as it&#39;s very tricky.</strong> In banking, it could be account closure, 90-day inactivity, or a balance near zero. Some customers churn more than once, so consider whether it&#39;s best to keep them. In short, your label drives behaviour.</li><li><strong>Watch Sleeping Dogs.</strong> Aggressive “save” campaigns can create complaints and unsubscribes. As I mentioned earlier, I’m in this segment with a few of my utility providers.</li><li><strong>Keep the experiments explainable.</strong> Starting with classical models helps get business buy-in. You can always upgrade to fancier meta-learners later (T-learner, X-learner, causal forests).</li></ul><h3>Conclusion</h3><p>That’s it for now. In <a href="https://medium.com/data-science-collective/stop-guessing-who-will-leave-how-i-would-predict-customer-churn-before-it-happens-300d34cbd13c"><strong>Part 1</strong></a>, we stopped guessing <strong>who</strong> will leave. In this article, we stop guessing <strong>what works</strong> to keep them.</p><p><strong>Uplift analysis</strong> bridges the gap between <strong>prediction</strong> and <strong>intervention</strong>. With a small, classical setup, you can rank customers by <strong>incremental impact</strong>, segment them into <strong>Persuadables / Sure Things / Lost Causes / Sleeping Dogs</strong>, and spend your budget where it actually moves the needle. As usual, my full code is <a href="https://github.com/wandabwa2004/churn_uplift/tree/main">here</a>, and I hope it&#39;s easy to follow.</p><p>I hope this walkthrough was also useful. Don’t forget to follow me, clap for me, and leave a comment. If you want to check out my other articles:</p><ul><li><a href="https://medium.com/data-science-collective/stop-guessing-who-will-leave-how-i-would-predict-customer-churn-before-it-happens-300d34cbd13c">STOP Guessing Who Will Leave — How I Would Predict Customer Churn Before It Happens</a></li><li><a href="https://medium.com/data-science-collective/from-data-to-dialogue-development-of-a-retrieval-augmented-generation-rag-chatbot-for-fitness-b9fbaf818ace">How to Create an Entire RAG System as a Newbie</a></li><li><a href="https://hermanwandabwa.medium.com/optimizing-equipment-maintenance-planning-with-deepseek-reasoning-llm-and-agents-crewai-a063114f8bb6">AI-Powered Equipment Maintenance Planning: Leveraging DeepSeek LLM and CrewAI for Smarter Decisions</a></li><li><a href="https://medium.com/data-science/capacity-optimization-in-freight-trains-part-1-4918f35a6433">Capacity Optimization in Freight Trains — Part 1</a></li><li><a href="https://medium.com/swlh/6-kgs-lost-in-31-days-of-covid-19-lockdown-a-data-analytics-perspective-a0061e0689f2">6 Kgs Lost in 31 Days of COVID-19 Lockdown: A Data Analytics Perspective</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=3e4230de2575" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-science-collective/stop-guessing-what-works-how-id-use-uplift-modelling-to-target-churn-interventions-without-fancy-3e4230de2575">Stop Guessing What Works — How I’d Use Uplift Modelling to Target Churn Interventions without fancy…</a> was originally published in <a href="https://medium.com/data-science-collective">Data Science Collective</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[STOP Guessing Who Will Leave — How I Would Predict Customer Churn Before It Happens]]></title>
            <link>https://medium.com/data-science-collective/stop-guessing-who-will-leave-how-i-would-predict-customer-churn-before-it-happens-300d34cbd13c?source=rss-58e995b6d0e3------2</link>
            <guid isPermaLink="false">https://medium.com/p/300d34cbd13c</guid>
            <category><![CDATA[customer-churn]]></category>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[churn]]></category>
            <category><![CDATA[propensity-model]]></category>
            <category><![CDATA[data-science]]></category>
            <dc:creator><![CDATA[Herman Wandabwa]]></dc:creator>
            <pubDate>Thu, 14 Aug 2025 05:13:17 GMT</pubDate>
            <atom:updated>2025-08-18T21:52:16.308Z</atom:updated>
            <content:encoded><![CDATA[<h3><strong>STOP Guessing Who Will Leave — How I Would Predict Customer Churn Before It Happens</strong></h3><h4><em>A simple, step-by-step approach to classical churn modelling that works — no black-box AI required</em></h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*U4n_oP_23cybkGBX" /><figcaption>Image generated by the author</figcaption></figure><p><a href="https://hermanwandabwa.medium.com/stop-guessing-who-will-leave-how-i-would-predict-customer-churn-before-it-happens-300d34cbd13c?source=friends_link&amp;sk=d0e1edb4c57fb35931c2bccbbd2026fd"><strong>Stuck behind a paywall? Read for free!</strong></a></p><p>In my discussions with a fellow data scientist the other day, I realized that I hadn’t done much in classical Machine Learning(ML) for the last year or so. This is something I really enjoyed as a product data scientist. I attribute this to being busy riding the Generative AI (Gen AI) wave that is ever-changing. I managed to write a few Gen AI articles to justify the time spent on it. Many of them can be found <a href="https://medium.com/me/stories/public">here</a>. Please read them and give some feedback. As always, all my codes are open-sourced <a href="https://github.com/wandabwa2004/DS_Projects">here</a>.</p><p>It&#39;s for this reason that I decided to relook at <strong>propensity modelling </strong>once again. For those who may not have heard of this term, propensity modelling is just the process of predicting the likelihood of a certain behaviour happening. For example, a jewelry shop predicting the likelihood of customers aged between 21 and 50 purchasing a certain ring etc. For this to happen, the shop has to understand the target outcome (whether that set of customers will purchase the ring or not). Therefore, this will always be use-case specific. In addition, the shop should also have a way of measuring the success of the modelling process. Don’t worry if this sounds complicated. I’ll break it down for you in the simplest terms possible with a relevant example.</p><h3>Propensity to Churn in finance</h3><p>Allow me to use a hypothetical example in the finance world. People often join banks for the purpose of accomplishing certain financial goals. However, some customers realise that the very banks may not be meeting their objectives. They then decide to leave after some time. This is a classical example of <strong>propensity-to-churn</strong> (leave the bank) modelling if the bank tries to statistically gauge whether they&#39;ll leave. By modelling this problem, the banks are likely to understand why certain customers left, thereby mitigating the same to counter the exit.</p><p>I’ll now walk you through a simplified example of a hypothetical Kenyan bank that is trying to predict the likelihood of some of its customers leaving. In this example, I’ll demonstrate to you how customer data spanning demographics, digital engagement, financial behavior, and product usage can be structured to model and predict customer churn. As mentioned above, this is a hypothetical example, meaning the data is also simulated. While the data is synthetic, I’ll make sure that it closely reflects real-world banking behavior in a Kenyan context. For those who’ve banked in Kenya, the incorporated data points will ring a bell, especially mobile money usage, etc., and other region-specific features .</p><h3>Dataset Structure &amp; Feature List</h3><p>I came up with the below features that would generally be collected by financial institutions. However, the list is not exhaustive and should not be adopted as is in production environments. The features are subdivided as follows.</p><p><strong>a) Customer Identity &amp; Demographics</strong></p><pre>- customer_id (unique), age (years), gender<br>- region (e.g. Nairobi, Rift Valley, Coast, Western, etc.), urban (1=urban, 0=rural)<br>- education_level (none/primary/secondary/tertiary)<br>- employment_status (employed, self-employed, informal, unemployed)<br>- KYC_verified (Yes/No)</pre><p><strong>b) Account &amp; Tenure Details</strong></p><pre>- account_open_date<br>- tenure_days (computed)<br>- customer_age_of_money (days since first deposit)<br>- account_type (savings, current, ecobank Eco-Savings, etc.)<br>- linked_mobile_number (SIM registered same person? flag for third-party SIM</pre><p><strong>c) Digital Engagement &amp; Mobile Banking Usage</strong></p><pre>- uses_equity_mobile_app (binary)<br>- equity_mobile_sessions_last_30d (count)<br>- equity_mobile_trans_volume_last_90d (KES)<br>- equity_mobile_txn_count_last_90d<br>- uses_equitel (binary, Equity Group SIM) <br>- equitel_txn_count_last_90d<br>- equity_online_login_count<br>- pesalink_inbound_count <br>- pesalink_outbound_count (interbank transfers) <br>- atm_withdrawals_count <br>- atm_deposits_count</pre><p><strong>d) Mobile Money Ecosystem</strong></p><pre>- mpesa_linked_to_bank (Yes/No)<br>- mpesa_cash_in_count, cash_out_count (last 90d)<br>- mpesa_balance_avg_last_30d<br>- uses_mshwari (Yes/No)<br>- mshwari_savings_balance, <br>- mshwari_loans_count<br>- fuliza_overdraft_count, <br>- fuliza_overdraft_amt</pre><p><strong>e) Transaction Behavior &amp; Financial Usage</strong></p><pre>- avg_balance_last_6m<br>- min_balance_last_6m<br>- num_deposits_last_90d, <br>- num_withdrawals_last_90d<br>- avg_monthly_inflow, <br>- avg_monthly_outflow<br>- loan_applications_count, <br>- loans_disbursed_amt <br>- loan_repayment_rate (ratio repaid vs due)<br>- credit_card_prepaid_spend / usage</pre><p><strong>f) Savings &amp; Spending Patterns</strong></p><pre>- savings_rate (deposit volume / inflow)<br>- spend_rate (withdrawal volume / inflow)<br>- recency_deposit_days (days since last deposit)<br>- frequency_deposit (days between deposits)<br>- recency_digital_txn_days<br>- frequency_digital_access_days</pre><p><strong>g) Behavioral Indicators &amp; Support Interactions</strong></p><pre>- complaints_count_last_year (e.g. service, fraud)<br>- branch_visits_count_last_year<br>- customer_support_calls_last_6m<br>- biometric_enabled_atm_user (Yes/No) <br>- security_alerts_triggered (password resets, blocked login)</pre><p><strong>h) Loyalty related features</strong></p><pre>- num_products_held (e.g. account, card, loan, insurance)<br>- loyalty_program_member (Yes/No)<br>- rewards_redeem_count<br>- referrals_count (how many new customers referred)</pre><p><strong>i) Churn/Outcome Variables</strong></p><pre>- churned (1=no activity or account closed within last 90 days)<br>- churn_date<br>- last_active_date<br>- reason_code (survey-captured reason: e.g. switched bank, poor service, moved regions, digital usability issues, trust/security concerns)This feature list is not exhaustive by any means. Internally, banks are likely to be capturing many more features especially in relation to the number of branch visits as well as financial well-being of customers. They are normally good indicators of churn especially if customers visit branches to complain.</pre><p>FYI, I’m likely not to use the entire list of the above features in the modeling process, so don’t be worried if you see some features missing.</p><h3>Data Simulation</h3><p>Coming up with a list of features like I did above is the easy part. However, the tricky bit is actually tweaking the simulated data to be statistically as close as possible to what the bank would have. In addition, real data is usually messy and, most times, provides a realistic distribution of features, complete with missing values, unusual spikes etc., compared to what I have here.</p><p>The code I put together generates a simulated dataset of about 10,000 customers, with account tenures spanning roughly three years. The numbers are not set in stone and can be changed. Remember, customers can churn more than once in reality, but I’ll just focus on one churn event in this simulation within the observation period. Please try modify the code to simulate more than one churn if you have the time.</p><p>The code is split three files config.py , generator.py and main.py for easier debugging. FYI this a good software engineering approach to adopt as data scientists. <strong>config.py</strong> holds all the variables where say the sample size, dates, region mix, urban shares, churn target, and distribution settings can be defined. This helps tune assumptions without touching logic.<strong>generator.py</strong> contains small, readable functions that build the dataset in stages (generation of the population size → adoption → transactions → amounts → credit → churn), with deterministic randomness for reproducibility. <strong>main.py</strong> is just a wrapper that loads the config, assembles the dataset, prints a quick churn sanity check, and writes to a equity_bank_churn_dataset.csvfile. The data folder is <a href="https://github.com/wandabwa2004/churn_in_finance/tree/master/data">here</a>.</p><p>Try playing with parameters in main.py and config.pyand you’re likely to get a fresh dataset each time. One thing to note is that it might not be easy to hit a certain percentage of churn in the data e.g., 30% if that’s your target. Try adjusting the <em>churn_target </em>and reducing the <em>weibull_scale_days </em>if you run into this issue. You’re likely to get to your target percentage of churn faster and after a few iterations.</p><h3>Statistical Distribution in the Data</h3><p>As mentioned in the introduction, the distribution in the data has to mimic the diversity you’d see in an actual Kenyan retail bank. This meant that I had to introduce a few nuances in the data as follows:</p><ol><li><em>Urban </em>and <em>rural </em>customers are modeled differently, whereby customers in Nairobi are treated as being in an urban area, while other provinces have both rural and urban population. They are in proportions that reflect real-life patterns.</li><li><em>Digital engagement</em> varies accordingly, with urban customers more likely to use the the mobile app, transact via Equitel, or link their M-Pesa wallets to their bank accounts, etc. This may not be overly factual, but I maintained it this way for simplicity.</li><li><em>Continuous variables</em> like transaction counts and monetary amounts are generated using statistical distributions that fit the nature of each feature. For example:</li></ol><ul><li><em>Poisson </em>and <em>Negative Binomial</em> distributions for counts like mobile sessions, deposit and withdrawal transactions, or branch visits. These capture the fact that most customers make only a handful of transactions, but a few transact heavily.</li><li><em>Lognormal </em>and <em>Gamma </em>distributions for balances, loan amounts, and savings etc. The idea here is to create a realistic skew where most customers hold small amounts, while a few hold very large sums.</li><li><em>Beta distribution </em>for <em>loan_repayment_rate </em>where all values are kept between 0 and 1, but skewing towards high repayment for employed or self-employed customers. This is also an assumption that might not be overly true if proper analysis on a real dataset was done.</li></ul><p>All these are defined in thegenerator.py file in the data <a href="https://github.com/wandabwa2004/churn_in_finance/blob/master/data/generator.py">here</a>.</p><p>Below is a summary of the train and test set features. It was a time-based split where the training data is between 2022–01–01 → 2023–12–31.</p><pre>=== TRAIN DATA ===<br>Rows: 6,694 | Columns: 25<br>Date range: 2022-01-01 → 2023-12-31<br>Churn rate: 33.21%<br><br>Numeric feature ranges:<br>                                           min          max          mean<br>age                                    18.0000      69.0000     40.545414<br>urban                                   0.0000       1.0000      0.592620<br>uses_equity_mobile_app                  0.0000       1.0000      0.723185<br>uses_equitel                            0.0000       1.0000      0.300717<br>mpesa_linked_to_bank                    0.0000       1.0000      0.841649<br>equity_mobile_sessions_last_30d         0.0000      45.0000      4.785181<br>equity_mobile_txn_count_last_90d        0.0000      93.0000     13.161189<br>mpesa_cash_in_count                     0.0000      64.0000      6.577980<br>mpesa_cash_out_count                    0.0000      42.0000      5.221691<br>num_deposits_last_90d                   0.0000      21.0000      3.943681<br>num_withdrawals_last_90d                0.0000      23.0000      3.593666<br>equity_mobile_trans_volume_last_90d  4532.4100  411993.9300  61774.712601<br>avg_balance_last_6m                  2132.8600  380194.2900  51316.757398<br>mshwari_savings_balance                 0.0000   32067.2900   3071.699382<br>mshwari_loans_count                     0.0000       9.0000      1.675530<br>fuliza_overdraft_amt                   32.2200   18405.4300   3077.626280<br>loan_applications_count                 0.0000       6.0000      1.231999<br>loan_repayment_rate                     0.0987       0.9993      0.740785<br>branch_visits_count_last_year           0.0000       8.0000      1.610397<br>complaints_count_last_year              0.0000       4.0000      0.204661<br><br>Most frequent categories:<br>  gender: Female (51.7%)<br>  region: Rift Valley (23.8%)<br>  education_level: Secondary (47.9%)<br>  employment_status: Employed (45.3%)<br>  KYC_verified: Yes (91.8%)</pre><p>On the other hand, the tests are done on one year of data that doesn&#39;t overlap with the training data (2024–01–01 → 2024–12–31)</p><pre>=== TEST DATA ===<br>Rows: 3,306 | Columns: 25<br>Date range: 2024-01-01 → 2024-12-31<br>Churn rate: 7.59%<br><br>Numeric feature ranges:<br>                                           min          max          mean<br>age                                    18.0000      69.0000     40.555354<br>urban                                   0.0000       1.0000      0.593164<br>uses_equity_mobile_app                  0.0000       1.0000      0.734725<br>uses_equitel                            0.0000       1.0000      0.306413<br>mpesa_linked_to_bank                    0.0000       1.0000      0.847550<br>equity_mobile_sessions_last_30d         0.0000      37.0000      4.920145<br>equity_mobile_txn_count_last_90d        0.0000      79.0000     13.261948<br>mpesa_cash_in_count                     0.0000      50.0000      6.626134<br>mpesa_cash_out_count                    0.0000      37.0000      5.165759<br>num_deposits_last_90d                   0.0000      22.0000      3.941016<br>num_withdrawals_last_90d                0.0000      23.0000      3.537205<br>equity_mobile_trans_volume_last_90d  3471.6600  324709.7100  60852.307114<br>avg_balance_last_6m                  2967.5200  511006.9400  50976.922426<br>mshwari_savings_balance                 0.0000   28821.7300   3090.220230<br>mshwari_loans_count                     0.0000       9.0000      1.702057<br>fuliza_overdraft_amt                   18.8900   15597.9200   3011.771385<br>loan_applications_count                 0.0000       6.0000      1.212341<br>loan_repayment_rate                     0.1345       0.9995      0.740829<br>branch_visits_count_last_year           0.0000       9.0000      1.615245<br>complaints_count_last_year              0.0000       4.0000      0.209316<br><br>Most frequent categories:<br>  gender: Female (50.2%)<br>  region: Rift Valley (24.6%)<br>  education_level: Secondary (46.8%)<br>  employment_status: Employed (44.5%)<br>  KYC_verified: Yes (92.0%)</pre><p>Churn as the target outcome is not random. The assumption is that its influenced by many factors such as low digital engagement, rural location, higher complaint counts etc. Therefore, the time until churn is generated using a <a href="https://medium.com/utility-machine-learning/survival-analysis-part-1-the-weibull-model-5c2552c4356f">Weibull survival model</a> which lets the likelihood of churn increase over time, especially for higher-risk customers. I’ve tuned the simulation so that around 30% of customers churn within the three-year window, similar to what many banks are likely to experience in practice.</p><p>While this is still a simplification of the reality, the dataset is statistically rich enough to demonstrate churn prediction techniques, explore the relationship between customer attributes and retention, and test how targeting strategies might work before deploying them on real, messy banking data. Remember, this is a simulated dataset and should not be used as a replacement for real data.</p><h3><strong>Propensity to Churn Modeling</strong></h3><p>I’ve defined the features and explained the rationale behind the data simulation. This data is not messy, so not much cleaning is needed. However, this will not be the case in a real dataset, as it will have so many missing and and sometimes very skewed values.</p><p>With the dataset preparation out of the way, the next step is to build predictive models that estimate each customer’s propensity to churn. For this project, I applied two complementary algorithms.</p><ol><li><strong>Logistic Regression</strong>—it&#39;s a simple and easily interpretable algorithm,, that is widely used in banking for its transparency. This means that it allows business teams to see exactly how each feature influences churn risk.</li><li><strong>Gradient Boosting—</strong>This<strong> </strong>is a powerful ensemble method that can model complex, non-linear relationships and interactions between features. This often achieves higher predictive accuracy at the cost of interpretability.</li></ol><p>Feel free to test a few more algorithms if you have the time. FYI, please have a look at Random Forest algorithm and how its performance differs from say Gradient Boosting one. This might just save you in your next interview.</p><p>By training and evaluating both models on the same data, we can compare their performance using metrics like <a href="https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html">ROC_AUC</a>, <a href="https://arize.com/blog/what-is-pr-auc/">Precision-Recall AUC</a>, Lift, and Gains curves, enabling data-driven decisions on which model is best suited for different campaign targeting strategies.</p><p>The below code is part of the modelling workflow. It details the data splits (train and test) and preprocessing steps for categorical and continuous variables. The code in section 5, 6 is the training one, and the evaluation (testing model performance) one is in part 7. Simply run the code <a href="https://github.com/wandabwa2004/churn_in_finance/blob/master/py/model.py">here</a> to get the entire modelling run after generating the sample data in activated environment. To run the the below, simply run the command python model.py in a Python terminal while inside py folder.</p><pre>df = pd.read_csv(DATA_PATH)<br>df = parse_dates(df, DATE_COLS)<br><br>y = df[TARGET].astype(int).values<br>X = df.drop(columns=[c for c in DROP_COLS if c in df.columns] + [TARGET], errors=&quot;ignore&quot;)<br><br># Only add columns that are not already in X<br>cols_to_add = [col for col in DATE_COLS + [TARGET] if col not in X.columns]<br>df_for_split = pd.concat([X, df[cols_to_add]], axis=1)<br>train_df, test_df = temporal_split(df_for_split, AS_OF_CUTOFF)<br><br>X_train = train_df.drop(columns=DATE_COLS + [TARGET])<br>y_train = train_df[TARGET].astype(int).values<br>X_test  = test_df.drop(columns=DATE_COLS + [TARGET])<br>y_test  = test_df[TARGET].astype(int).values<br><br>print(f&quot;Train: {len(X_train):,} rows | Test: {len(X_test):,} rows&quot;)<br><br># Describe training and test sets<br>describe_dataset(X_train, y_train, train_df[&quot;account_open_date&quot;], &quot;Train&quot;)<br>describe_dataset(X_test, y_test, test_df[&quot;account_open_date&quot;], &quot;Test&quot;)<br><br># -----------------------------<br># 4) Preprocessing<br># -----------------------------<br>cat_cols = X_train.select_dtypes(include=[&quot;object&quot;, &quot;category&quot;]).columns.tolist()<br>num_cols = [c for c in X_train.columns if c not in cat_cols]<br><br>cat_pipe = Pipeline([<br>    (&quot;imputer&quot;, SimpleImputer(strategy=&quot;most_frequent&quot;)),<br>    (&quot;ohe&quot;, OneHotEncoder(handle_unknown=&quot;ignore&quot;, sparse_output=False)),<br>])<br><br>num_pipe = Pipeline([<br>    (&quot;imputer&quot;, SimpleImputer(strategy=&quot;median&quot;)),<br>    (&quot;scaler&quot;, StandardScaler()),<br>])<br><br>pre = ColumnTransformer([<br>    (&quot;cat&quot;, cat_pipe, cat_cols),<br>    (&quot;num&quot;, num_pipe, num_cols),<br>])<br><br># -----------------------------<br># 5) Models<br># -----------------------------<br>logreg = LogisticRegression(max_iter=1000, class_weight=&quot;balanced&quot;, random_state=SEED)<br>gbdt   = GradientBoostingClassifier(<br>    random_state=SEED,<br>    learning_rate=0.05,<br>    n_estimators=300,<br>    max_depth=3,<br>    subsample=0.9<br>)<br><br>pipe_lr  = Pipeline([(&quot;pre&quot;, pre), (&quot;clf&quot;, logreg)])<br>pipe_gbdt = Pipeline([(&quot;pre&quot;, pre), (&quot;clf&quot;, gbdt)])<br><br># -----------------------------<br># 6) Train<br># -----------------------------<br>pipe_lr.fit(X_train, y_train)<br>pipe_gbdt.fit(X_train, y_train)<br><br># -----------------------------<br># 7) Evaluate<br># -----------------------------<br>results = []<br>results.append(evaluate_model(&quot;Logistic Regression&quot;, pipe_lr, X_test, y_test))<br>results.append(evaluate_model(&quot;Gradient Boosting&quot;, pipe_gbdt, X_test, y_test))<br><br>print(&quot;\n=== MODEL COMPARISON (Test Set) ===&quot;)<br>for r in results:<br>    print(f&quot;{r[&#39;name&#39;]}:&quot;)<br>    print(f&quot;  ROC AUC:        {r[&#39;roc_auc&#39;]:.4f}&quot;)<br>    print(f&quot;  PR  AUC:        {r[&#39;pr_auc&#39;]:.4f}&quot;)<br>    print(f&quot;  Top 10% Prec.:  {r[&#39;top10_precision&#39;]:.4f}&quot;)<br>    print(f&quot;  Top 20% Prec.:  {r[&#39;top20_precision&#39;]:.4f}&quot;)<br>    print(f&quot;  Confusion Matrix @0.5 [TN FP; FN TP]:\n{r[&#39;conf_matrix&#39;]}&quot;)<br>    print(&quot;-&quot; * 50)</pre><h3>Model Evaluation Results</h3><p>To evaluate the predictive performance of the two models, I tested both the <em>Logistic Regression</em> and <em>Gradient Boosting</em> on the held-out (data that is left out of the training phase) test dataset. As mentioned above, performance was assessed using a mix of threshold-independent metrics (ROC AUC, PR AUC) and business-relevant targeting metrics (Top-k Precision, Confusion Matrix). The results are as follows and they indicate the following:</p><pre>| Metric               | Logistic Regression | Gradient Boosting |<br>| -------------------- | ------------------- | ----------------- |<br>| ROC AUC              | 0.7872              | 0.7814            |<br>| Precision-Recall AUC | 0.1739              | 0.1804            |<br>| Top 10% Precision    | 0.1909              | 0.1970            |<br>| Top 20% Precision    | 0.1785              | 0.1710            |</pre><ol><li>Both models achieve similar <strong>ROC_AUC</strong> scores (~0.78), indicating good ranking ability in their predictions.</li><li><em>Gradient Boosting</em> edges out <em>Logistic Regression</em> in <strong>Top 10% Precision. </strong>What this means is that it&#39;s slightly better at identifying the highest-risk customers making it ideal for identifying potential churners for small, focused retention campaigns. If I was in charge of the results implementation, then I’ll use gradient boosting results if budget is an issue.</li><li><em>Logistic Regression</em> slightly outperforms in the <strong>Top 20% Precision. </strong>This makes it competitive for broader targeting strategies.</li></ol><h4>Confusion Matrices @ 0.5 Threshold</h4><p>A confusion matrix is just a table with side by side model evaluation results in classification models like in the above example. The <em>Logistic Regression</em> results are below:</p><pre>True Negatives(TN) = 1880  | False Positives(FP) = 1175 <br>False Negatives(FN) =  39  | True Positives (TP) = 212</pre><p>This just means the model identified 212 true churners correctly, missed 39, and had 1,175 false positives.</p><p>For the <em>Gradient Boosting</em> one, the results were as follows:</p><pre>True Negatives(TN) = 2112 | False Positives(FP) = 943  <br>False Negatives(FN) = 67  | True Positives (TP) = 184</pre><p>Unlike the logistics regression one, GBM had fewer false positives but missed more actual churners than Logistic Regression.</p><h4>Key Drivers of Churn</h4><p>I also identified key drivers of churn following feature importance values and direction of coefficients in the two models. <em>Logistic Regression</em> coefficients just reveal the <strong><em>direction </em></strong>and<strong><em> magnitude</em></strong> of influence as shown in the below plot. The interpretation is as follows:</p><ul><li><strong>Negative coefficients</strong> (reduce churn risk): <em>Secondary education</em>, <em>Primary education</em>, <em>Coast region</em>, <em>Higher branch visits etc.</em></li><li><strong>Positive coefficients</strong> (increase churn risk) like <em>KYC verified.</em></li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*yHTJX6XQZgQ3FkHpudCxaw.png" /></figure><p>The bars were color-coded so that <strong>red</strong> features are deemed to be the one to push churn risk up and <strong>blue</strong> features lower churn risk, making interpretability easier.</p><p>For example, customers with <em>secondary education</em> or from the <em>coast region </em>are likely to be better integrated in the banking system that they don’t want to leave. Its the opposite of <em>KYC verified </em>where the likelihood of customers with this positive feature are likely to leave the bank or already on the way out.</p><p>On the other hand, <em>Gradient Boosting</em> feature importances show predictive power<strong> </strong>as in the following feature importance plot:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ERhcAAmePXS7LfcAtZA3Aw.png" /></figure><p>Findings from the above plot show that:</p><ul><li>The most influential features just like in the logistics regression example are still the <em>Education level (Secondary, Primary)</em>, <em>Branch visits</em>, <em>Region (Coast)</em>, <em>Complaints count </em>in that order.</li><li>Other factors: <em>Employment status</em>, <em>Loan repayment rate</em>, <em>KYC verification status</em>.</li></ul><p>Its worth noting that education_level_Secondary is ranked as by far the most important predictor (importance 0.416) by GBM. However, it doesn’t really tell if it’s protective or risky. Please have a look at SHAP values <a href="https://shap.readthedocs.io/en/latest/example_notebooks/overviews/An%20introduction%20to%20explainable%20AI%20with%20Shapley%20values.html">here</a> as they are a good answer to this shortfall in using plain feature importance values.</p><h4>Business Implications</h4><p>Beyond the above metrics, churn prediction models need to demonstrate <strong>business value </strong>and specifically, how well they help the bank target retention campaigns. Two types of charts make sense in this aspect :</p><ul><li><strong>Cumulative Gains Chart — </strong>This shows the proportion of actual churners captured in the customer base, ranked by predicted churn probability. These ones are shown in the left part of the below plots.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*hN8maTXigiU8fBmtV-i5gw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*4xpUAjSOpK95NeeQLdcyUQ.png" /><figcaption>Cumulative Gains and Lift Charts</figcaption></figure><p>Their interpretation is quite simple. The diagonal dashed line represents a random targeting strategy i.e., in case no model is used. From the visualisations, the curves rise steeply above this diagonal line. This is an indicator that the model is effectively identifying churn-prone customers early in the targeting process which is the case in the two algorithms. For example, when we target <strong>40</strong>% of customers most likely to churn, there is a high chance that we’ll capture over <strong>80%</strong> of actual churners. This is significantly better than random targeting, which would only capture <strong>~40%</strong> of churners with the same effort.</p><ul><li><strong>Lift Charts </strong>are on the right side of the gains chart above. They basically<strong> </strong>just tells us <em>how much better</em> the model is at identifying churners compared to random selection. A lift of 3 means we are three times better than random at that targeting level.</li></ul><p>From the above results, <em>Gradient Boosting</em><strong> </strong>delivers the highest lift in the very first deciles, reaching <strong>~4.8×</strong> improvement over random when targeting the <strong>top 1%</strong> of customers. It then stabilizes around a lift of ~2.0–2.5 for the next <strong>20–40%</strong> of the population. On the ther hand, <em>Logistic Regression</em><strong> </strong>starts with a maximum lift of <strong>~3×</strong> in the <strong>top 1%</strong> and maintains a lift above <strong>2</strong> for a large portion of the ranked list, indicating consistent value.</p><p><strong>Business Takeaway from This Analysis:</strong></p><p>For such a bank, these charts highlight the practical advantage of using a predictive churn models. Instead of spreading retention efforts across the entire customer base, targeting just the top 20% most at-risk customers could capture the majority of potential churners. This translates into <strong>lower intervention costs</strong> (fewer SMS, calls, or retention offers) and <strong>higher ROI</strong> for customer retention campaigns. The interpretation is as follows:</p><ul><li><strong>For highly targeted campaigns</strong> (e.g., top 10% of customers at risk), <em>Gradient Boosting</em> is slightly stronger and captures more churners early.</li><li><strong>For broader campaigns</strong> (top 20%+ of customers), <em>Logistic Regression</em> is competitive and has the advantage of interpretability, a very crucial factor to consider for explaining risk drivers to non-technical business stakeholders.</li><li>Both models highlight similar churn drivers, with <strong>education level</strong>, <strong>branch visit patterns</strong>, and <strong>regional differences</strong> being strong predictors, which aligns with observed customer behavior patterns in the Kenyan banking context. Remember this is likely to be very different when a real dataset is used.</li></ul><h3>Conclusion</h3><p>I introduced <em>propensity modelling</em>, a vital skill for product data scientists and analysts. In addition, I deep-dived in the process of simulating realistic customer data for a <em>propensity-to-churn</em> use case. I was able to demonstrate strategies around customer data simulation, engineering of relevant features, and finally the process of building predictive models to estimate the propensity to churn. While simulated data lacks the messiness and unpredictability of real-world datasets, it allowed us to focus on the core workflow detailing the data preparation, model training, evaluation, and interpretation steps.</p><p>The results show that both <em>Logistic Regression</em> and <em>Gradient Boosting</em> can effectively identify high-risk customers, with Gradient Boosting having a slight edge in early-decile lift. Importantly, the evaluation metrics, cumulative gains, and lift charts make it clear that targeted retention strategies can capture a large proportion of churners while engaging only a small portion of the customer base.</p><p>In a real banking context, such models can help allocate retention budgets more efficiently, reduce customer attrition, and ultimately protect revenue. However, it’s worth noting that model performance will depend heavily on data quality, feature richness, and continuous retraining to adapt to changing customer behavior.</p><p>This approach can be extended beyond churn to other key business challenges such as cross-selling etc., all driven by the same principle: using data-driven insights to focus efforts where they matter most. Please let me know in the comments whether I should write related articles. As usual, my code is always open-sourced and is <a href="https://github.com/wandabwa2004/churn_in_finance/tree/master">here</a>. Please feel free to clone it and re-run. I hope you’ll remember to re-look at the random forest algorithm.</p><p>I hope this walkthrough was also useful. Don’t forget to follow me, clap for me, and leave a comment. If you want to check out my other articles:</p><ul><li><a href="https://medium.com/data-science-collective/building-a-rag-system-with-mmr-for-safaricoms-smart-assistant-1e9ba91b9bfe">Building a RAG System with MMR for Safaricom’s Smart Assistant</a></li><li><a href="https://hermanwandabwa.medium.com/finetuning-llama-2-model-on-safaricoms-product-related-faqs-c9b226a43106">Drive Customer Success: Supercharging Safaricom’s Product FAQs with Llama 2 Model</a></li><li><a href="https://medium.com/data-science-collective/from-data-to-dialogue-development-of-a-retrieval-augmented-generation-rag-chatbot-for-fitness-b9fbaf818ace">How to Create an Entire RAG System as a Newbie</a></li><li><a href="https://hermanwandabwa.medium.com/optimizing-equipment-maintenance-planning-with-deepseek-reasoning-llm-and-agents-crewai-a063114f8bb6">AI-Powered Equipment Maintenance Planning: Leveraging DeepSeek LLM and CrewAI for Smarter Decisions</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=300d34cbd13c" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-science-collective/stop-guessing-who-will-leave-how-i-would-predict-customer-churn-before-it-happens-300d34cbd13c">STOP Guessing Who Will Leave — How I Would Predict Customer Churn Before It Happens</a> was originally published in <a href="https://medium.com/data-science-collective">Data Science Collective</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[From Code to Coping: Inside an AI Chatbot Designed for Teen Well-being]]></title>
            <link>https://medium.com/data-science-collective/from-code-to-coping-inside-an-ai-chatbot-designed-for-teen-well-being-4a0869a8fbc8?source=rss-58e995b6d0e3------2</link>
            <guid isPermaLink="false">https://medium.com/p/4a0869a8fbc8</guid>
            <category><![CDATA[python]]></category>
            <category><![CDATA[agentic-ai]]></category>
            <category><![CDATA[therapy]]></category>
            <category><![CDATA[artificial-intelligence]]></category>
            <category><![CDATA[teenagers]]></category>
            <dc:creator><![CDATA[Herman Wandabwa]]></dc:creator>
            <pubDate>Sat, 17 May 2025 13:43:50 GMT</pubDate>
            <atom:updated>2025-05-26T21:57:07.627Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*crKhOe-PkAUmRznufRTldw.png" /><figcaption>Image generated by the author</figcaption></figure><p><strong>Stuck behind a paywall? </strong><a href="https://hermanwandabwa.medium.com/from-code-to-coping-inside-an-ai-chatbot-designed-for-teen-well-being-4a0869a8fbc8?source=friends_link&amp;sk=91cbec33adf70662805959565298abbe"><strong>Read for free</strong></a><strong>!</strong></p><p>I took a short break from writing to focus on some pressing AI-related work commitments. This gave me time to think about fresh ideas I could explore in the AI space, especially ones that can have a real impact on everyday people.</p><p>So, last week, I spent some time thinking deeply about how AI can support mental health therapy for teenagers, specifically in Kenya and other developing countries. Teenagers in these regions face growing mental health challenges due to academic pressure, fears around unemployment, and the emotional toll of poverty and family instability. Yet, access to professional help remains limited.</p><p>Normally, there are very few trained counselors and healthcare centers to handle such cases. In addition, some cultural barriers and cost of therapy make it even harder for the teens to reach out. In fact, I spoke with a counseling professional recently who was concerned about the high cost of proper therapy in Kenya. This is made worse when compared to the median wage in a country where about <a href="https://www.knbs.or.ke/reports/kenya-poverty-report-2022/#:~:text=Overview,39.8%20per%20cent%20in%202022.">39.8%</a> of people live below the poverty line.</p><p>This got me really interested in exploring how AI can be used to tackle this issue. I thought of how I can leverage <a href="https://blogs.nvidia.com/blog/what-is-agentic-ai/">agentic AI</a> as a promising solution to this. AI Agents are scalable, always available, and can provide immediate support. My goal with this mini-project is simple: <em>create a safe, accessible, AI-powered space for Kenyan teens to explore their feelings and learn coping strategies</em></p><p>Let me be clear — this is <strong>NOT</strong> meant to replace human therapists. It’s just a supportive tool for young people who might otherwise go unheard.</p><h4>Foundations of AI and Counselling Psychology</h4><p>To ensure the AI system I built is not only technically sound but also clinically meaningful, I grounded its design in evidence-based practices from adolescent mental health research.</p><p>Effective therapy for teens isn’t just about giving advice. Its about building trust and respecting the cultural context they are accustomed to. In addition, teaching emotional and cognitive skills and empowering them to ensure safety during crises are other considerations to be factored.</p><p>Here are the key principles that guided the development of each agent in the AI system:</p><p>✅ <strong>Emotional validation </strong>— Teenagers need to feel seen, heard, and accepted without judgment. Research shows that the therapeutic alliance (bond between therapist and client) is one of the strongest predictors of positive outcomes (Shirk &amp; Karver, 2003). That’s why I created the <em>Empathy Agent </em>to simulate this connection by offering warm, validating responses to their concerns.</p><p>✅ <strong>Cultural relevance</strong> — Mental health interventions are most effective when they align with the cultural background of the person receiving them (Sue et al., 2009). In Kenya, this means recognizing local languages that are closer to teens like Sheng and Kiswahili. In addition, understanding family roles, spiritual beliefs, and common stressors such as academic pressure and unemployment etc., are to be considered. Here, I added the <em>Cultural Agent </em>that ensures responses resonate well with Kenyan teens. This agent can be adapted for different demographics.</p><p>✅ <strong>Psychoeducation and cognitive-behavioral skills </strong>— Cognitive Behavioral Therapy (CBT) is one of the most evidence-based treatments for adolescent depression and anxiety (Weisz et al., 2017). Teaching teens how thoughts, emotions, and behaviors interact builds resilience and self-awareness. The <em>CBT </em>and <em>Coping </em>agents<em> </em>are meant to deliver bite-sized psychoeducation tools to help with this.</p><p>✅ <strong>Empowerment and connection to resources </strong>— Adolescents are more likely to thrive when they feel in control of their lives and are connected to support networks (Zimmerman, 1995). The <em>Goal Agent </em>in the framework helps teens clarify and break down personal goals, while the <em>Resource Agent </em>links them to real-world services in Kenya, bridging the gap between AI and human care.</p><p>✅ <strong>Crisis management </strong>— Safety comes first. Following global suicide prevention guidelines (WHO, 2014), I included a <em>Crisis Agent </em>that prioritizes risk assessment and provides grounding techniques and emergency contact info when high-risk language is detected. High-risk keywords include:</p><pre>  risk_keywords = [<br>        &quot;suicide&quot;, &quot;kill myself&quot;, &quot;end my life&quot;, &quot;harm myself&quot;,<br>        &quot;hurt myself&quot;, &quot;die&quot;, &quot;death&quot;, &quot;no point&quot;, &quot;give up&quot;, &quot;hopeless&quot;, &#39;<br>        &quot;abuse&quot;, &quot;hitting me&quot;, &quot;beating me&quot;, &quot;harming me&quot;,<br>        &quot;scared&quot;, &quot;terrified&quot;, &quot;trapped&quot;, &quot;emergency&quot;<br>    ]</pre><h3>AI Agents</h3><p>Each of the above psychological foundations informed the design of the agentic AI framework. Each agent was mapped directly to a counselling psychology principle as follows:</p><ol><li><strong>The Empathy Agent </strong>provides the emotional validation and builds trust with the teen.</li><li><strong>The Cultural Agent </strong>— Ensures relevance and respect for the Kenyan teen’s lived experience. This could always be customised for other cultures and demographics.</li><li><strong>Cognitive Behavioral Therapy</strong>(<strong>CBT) and Coping Agents </strong>— These agents deliver psycho-education and teach practical coping strategies based on the problem at hand.</li><li><strong>Goal Agent e</strong>ncourages empowerment by helping teens set and track personal goals.</li><li><strong>Resource Agent</strong> — Connects teens with real-life support services in Kenya.</li><li><strong>The Crisis Agent </strong>— Activates in emergencies, providing immediate crisis contacts like <a href="https://childlinekenya.co.ke/">Childline Kenya</a> (116 or +254 722 116116) and <a href="https://befrienders.org/find-support-now/befrienders-kenya/">Befrienders Kenya</a> (+254 722 178 177). This agent’s main role is to express concern and guide teens toward human help that is of essence in times like this.</li></ol><h3>Agentic AI Pipeline</h3><p>By combining these agents, the system mimics a layered and integrative approach that a human therapist would use. Here’s how the flow works:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*zNYYFz07_Urnttgcxpi8pg.png" /><figcaption>Flowchart showing the logic in the agentic AI system</figcaption></figure><p>When a user inputs a message, the system scans for risk keywords to assign a risk score. If the message indicates a potential crisis (e.g., mentions of suicide or abuse), the system immediately switches to crisis mode. There is a risk score that is computed before the change to crisis mode in a very simple way. This why the risk score in the code below is just the minimum of a weighted score per crisis word hit and is capped at 1. Basically, any “hint” of a crisis word will immediately trigger the system to follow the crisis workflow. This in return triggers specific agents designed for crisis response.</p><pre>hits = sum([1 for word in risk_keywords if word in text_lower])<br>if hits == 0:<br>    return 0.0<br># Simple scoring: base score + increment per hit, capped at 1.0<br># Ensures any hit gives a noticeable score.<br>risk_score = min(0.1 + 0.1 * hits, 1.0)</pre><p>If no high-risk words are detected, the system follows the standard workflow as follows:</p><ol><li>First, it applies cultural context analysis to make sure the crafted message is culturally relevant.</li><li>Secondly, it offers empathetic listening.</li><li>Finally, it branches into cognitive-behavioral guidance, coping strategies, goal setting, or resource recommendations based on the words in the teen’s message.</li></ol><p>All agentic outputs are then combined into a single, coherent response using this function like this:</p><pre>def combine_agent_responses(step_responses, risk_score):<br>    &quot;&quot;&quot;<br>    Combine multiple agent responses into a coherent chat message.<br>    Prioritizes crisis response if risk is high.<br>    Otherwise, attempts to weave together empathy, advice/resources, and a supportive closing.<br>    Args:<br>        step_responses (list[str]): A list of strings, where each string is the output<br>                                    of a sequential LLM call simulating an agent&#39;s task.<br>        risk_score (float): The calculated risk score for the user&#39;s message.<br>    &quot;&quot;&quot;<br>    responses = []<br>    final_response = &quot;I&#39;m here to listen. How can I help you today?&quot; # Default fallback<br><br>    # --- 1. Extract Raw Outputs ---<br>    # Ensure step_responses is a list of strings<br>    if isinstance(step_responses, list) and all(isinstance(r, str) for r in step_responses):<br>        responses = [r.strip() for r in step_responses if r and r.strip()]<br>    else:<br>        print(f&quot;Warning: Invalid step_responses format received: {step_responses}&quot;)<br>        # Attempt to use the input directly if it&#39;s a string, otherwise return default<br>        return str(step_responses) if isinstance(step_responses, str) else final_response<br><br>    # --- 2. Handle High-Risk (Crisis) Scenario ---<br>    # Use the risk score threshold defined for routing (0.15)<br>    if risk_score &gt;= 0.15:<br>        print(&quot;Combine Responses: High risk detected, prioritizing crisis output.&quot;)<br>        # In the crisis flow (Cultural -&gt; Empathy -&gt; Crisis -&gt; Coping), the Crisis response is likely the 3rd output.<br>        # The Coping response (4th) might offer grounding, but the Crisis message is paramount.<br>        crisis_response = &quot;It sounds like you&#39;re going through a very difficult time. Please reach out for immediate support by calling Childline Kenya at 116 or Befrienders Kenya at +254 722 178 177. They are available to help you right now.&quot; # Safer default crisis message<br>        if len(responses) &gt;= 3:<br>            # Assume the third response (index 2) is from the Crisis step<br>            # Check if it contains hotline numbers; if so, use it directly.<br>            if &quot;116&quot; in responses[2] or &quot;Befrienders&quot; in responses[2].lower():<br>                 crisis_response = responses[2]<br>            # If not, maybe the 4th (coping) response (index 3) has them? Less likely but check.<br>            elif len(responses) &gt;= 4 and (&quot;116&quot; in responses[3] or &quot;Befrienders&quot; in responses[3].lower()):<br>                 crisis_response = responses[3]<br>            # If neither contains hotlines, stick to the default crisis message.<br><br>        # Ensure essential hotline info is present in the final crisis response<br>        if &quot;116&quot; not in crisis_response:<br>            crisis_response += &quot;\nEmergency Help: Call 116 (Childline)&quot;<br>        if &quot;Befrienders&quot; not in crisis_response:<br>             crisis_response += &quot;\nOr call +254 722 178 177 (Befrienders Kenya)&quot;<br><br>        return crisis_response.strip()<br><br>    # --- 3. Handle Standard (Non-Crisis) Scenario ---<br>    print(f&quot;Combine Responses: Standard flow detected. Responses received: {len(responses)}&quot;)<br>    if not responses:<br>         return final_response # Return default if no responses somehow<br><br>    # In standard flow (Cultural -&gt; Empathy -&gt; [CBT/Coping/Goal/Resource]), expect at least 3 responses.<br>    # The first (index 0) is cultural context (often internal note), second (index 1) is empathy, third (index 2) is the specialized response.<br><br>    # Start with Empathy (usually the second response)<br>    empathy_part = &quot;&quot;<br>    if len(responses) &gt;= 2:<br>        # Take the empathy response (index 1), usually short and validating.<br>        empathy_part = responses[1]<br>        # Basic check if it looks like validation<br>        if not (&quot;understand&quot; in empathy_part.lower() or &quot;hear you&quot; in empathy_part.lower() or &quot;sounds like&quot; in empathy_part.lower() or &quot;okay to feel&quot; in empathy_part.lower()):<br>             empathy_part = f&quot;I hear you. {empathy_part}&quot; # Add a generic validation if needed<br><br>    # Get the main content (from CBT, Coping, Goal, or Resource agent - usually the last response)<br>    main_content_part = &quot;&quot;<br>    if len(responses) &gt;= 3:<br>        main_content_part = responses[-1] # Assume the last step provides the core advice/resource/prompt<br>        # Remove potential redundancy if it repeats the empathy part<br>        if empathy_part and main_content_part.startswith(empathy_part.split(&#39;.&#39;)[0]): # Check if starts similarly<br>             pass # Keep it as is, agent might have synthesized well<br>        elif empathy_part:<br>             main_content_part = &quot;\n\n&quot; + main_content_part # Add spacing if distinct<br><br>    # Construct the final response<br>    if empathy_part and main_content_part:<br>        final_response = empathy_part + main_content_part<br>    elif main_content_part: # Only got main content<br>        final_response = main_content_part<br>    elif empathy_part: # Only got empathy<br>        final_response = empathy_part + &quot;\n\nWhat else is on your mind?&quot; # Add a prompt<br>    elif responses: # Fallback to last response if logic failed<br>         final_response = responses[-1]<br><br>    # Add a general supportive closing, avoiding redundancy if already present<br>    closing_statement = &quot;\n\nRemember, taking care of yourself is important, and you don&#39;t have to figure everything out alone. I&#39;m here to support you.&quot;<br>    if not (&quot;remember&quot; in final_response.lower() or &quot;alone&quot; in final_response.lower() or &quot;support you&quot; in final_response.lower()):<br>         final_response += closing_statement<br><br>    return final_response.strip()</pre><p>The above function ensures the final message is emotionally validating, culturally appropriate, and actionable — whether the user is in crisis or just needs support.</p><h3>Sample workflows Outputs</h3><h4><strong>Crisis Workflow Outputs:</strong></h4><p>Below is an example of a high-risk message that triggered the crisis workflow. From the response, the message was culturally sensitive, empathetic and followed by emergency contact details.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*pcdbueVT6HLS8l1Znto_JA.png" /><figcaption>Sample crisis workflow output</figcaption></figure><p>In another test, the same user sent a follow-up message in Sheng — a local slang blending Swahili and English (Githiora, C. 2018) that is commonly spoken by Kenyan youth. The system understood the message and responded in the same language appropriately.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*wbnSbUrjTaxAAGRBNpNXPQ.png" /><figcaption>Crisis follow up in a non-english language</figcaption></figure><h4>Sample Standard Workflow Output</h4><p>For non-crisis messages, the system follows the standard path whereby:</p><ol><li>It starts with a culturally relevant and empathetic response.</li><li>Thereafter , it uses keyword routing to determine which agent should respond.</li><li>Finally, it combines and refines the agent outputs into a final, supportive message.</li></ol><p>A sample output of this workflow type is as follows:-</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Jdgv3OWF5DSA3y8_lO8BoA.png" /><figcaption>Standard workflow output</figcaption></figure><p>From the above outputs, the system adapts well to both crisis and non-crisis situations while staying grounded in psychological principles.</p><p>Its really that simple. I didn’t share many code snippets as I’ve always done in my <a href="https://medium.com/me/stories/public">previous articles</a> as the bigger chunk is just the prompting and re-routing. However, as usual, my code is always open and can be accessed <a href="https://github.com/wandabwa2004/LLMs/tree/main/Agents/therapy_agent">here</a>. Instructions to run the same are in the repo’s Readme file. Just remember to use your Open AI key for the code to run.</p><p><strong>TLDR;Below are the key aspects of the project :</strong></p><ol><li>I presented a culturally-tailored AI Support app built specifically for Kenyan teens. The chatbot understands local languages like Sheng and Kiswahili, and addresses therapy related issues factoring in the cultural context of most Kenyan youth.</li><li>The app is made up of several agents simulating different therapeutic roles i.e., empathy, CBT, coping, goal-setting, resources, and crisis management — through sequential LLM calls.</li><li>The app is also able to detect crisis in the query of the person interacting with it. It does this by identifying high-risk language in the wordings and is able to redirect users to emergency hotlines in Kenya.</li></ol><h3><strong>Summary and future work</strong></h3><p>Blending AI with cultural knowledge is a step forward in the dissemination of mental health support in especially under-resourced areas. It has the potential to transform therapy into a continuous, personalized journey of self-discovery and resilience as demonstrated in this mini-project.</p><p>In conclusion, I’m still of the opinion that AI should <strong>NEVER </strong>be a replacement of trained professionals. It should only augment the work of therapists. Humans deserve more than just algorithmic responses — they deserve genuine empathy and real human connection.</p><p>Looking ahead, here are a few features I’d like to add to this project if time allows:</p><ol><li>Personalized Long-Term Memory &amp; Goal Tracking — With consent and anonymization, I’m kene on adding a feature to allow the chatbot to remember themes, preferred coping strategies, or past goals etc. This is likely to personalise the experiences of users on the app.</li><li>Integration of AI-driven interactive therapeutic tools like mood journaling and CBT-style ones could be done in the future.</li></ol><p>Such features would make the chatbot more engaging and give teens hands-on tools to manage their mental health.</p><p>I hope this walkthrough was also useful. Don’t forget to follow me, clap for me, and leave a comment. If you want to check out my other articles:</p><ul><li><a href="https://medium.com/data-science-collective/building-a-rag-system-with-mmr-for-safaricoms-smart-assistant-1e9ba91b9bfe">Building a RAG System with MMR for Safaricom’s Smart Assistant</a></li><li><a href="https://medium.com/data-science-collective/from-data-to-dialogue-development-of-a-retrieval-augmented-generation-rag-chatbot-for-fitness-b9fbaf818ace">How to Create an Entire RAG System as a Newbie</a></li><li><a href="https://hermanwandabwa.medium.com/finetuning-llama-2-model-on-safaricoms-product-related-faqs-c9b226a43106">Drive Customer Success: Supercharging Safaricom’s Product FAQs with Llama 2 Model</a></li></ul><h3>References</h3><ul><li>Shirk, S. R., &amp; Karver, M. (2003). Prediction of treatment outcome from- relationship variables in child and adolescent therapy. <em>Journal of Consulting and Clinical Psychology </em>, 71(3), 452–464. <a href="https://doi.org/10.1037/0022-006X.71.3.452">https://doi.org/10.1037/0022-006X.71.3.452</a></li><li>Sue, S., Cheng, J. K., Saad, C. S., &amp; Chu, J. P. (2012). Asian American mental health: A call to action. <em>American Psychologist </em>, 67(7), 532–544. <a href="https://doi.org/10.1037/a0028900">https://doi.org/10.1037/a0028900</a></li><li>Weisz, J. R., Kuppens, S., Ng, M. Y., Eckshtain, D., Ugueto, A. M., Vaughn-Coaxum, R., &amp; Fordwood, S. R. (2017). What five decades of research tells us about the effects of youth psychological therapy. <em>American Psychologist </em>, 72(2), 79–117. <a href="https://doi.org/10.1037/a0040360">https://doi.org/10.1037/a0040360</a></li><li>Zimmerman, M. A. (1995). Psychological empowerment: Issues and illustrations. <em>American Journal of Community Psychology </em>, 23(5), 581–599. <a href="https://doi.org/10.1007/BF02506983">https://doi.org/10.1007/BF02506983</a></li><li>World Health Organization. (2014). <em>Preventing Suicide: A Global Imperative </em>. <a href="https://apps.who.int/iris/handle/10665/131056">https://apps.who.int/iris/handle/10665/131056</a></li><li>Githiora, C. (2018). <em>Sheng: Rise of a Kenyan Swahili Vernacular </em>. Boydell &amp; Brewer. <a href="https://doi.org/10.2307/j.ctv1ntfvm">https://doi.org/10.2307/j.ctv1ntfvm</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=4a0869a8fbc8" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-science-collective/from-code-to-coping-inside-an-ai-chatbot-designed-for-teen-well-being-4a0869a8fbc8">From Code to Coping: Inside an AI Chatbot Designed for Teen Well-being</a> was originally published in <a href="https://medium.com/data-science-collective">Data Science Collective</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Creating a Personalized Safari Guide with Agentic AI and Text-to-Speech (TTS)]]></title>
            <link>https://medium.com/data-science-collective/creating-a-personalized-safari-guide-with-agentic-ai-and-text-to-speech-tts-343a7f354ff8?source=rss-58e995b6d0e3------2</link>
            <guid isPermaLink="false">https://medium.com/p/343a7f354ff8</guid>
            <category><![CDATA[personalization-in-ai]]></category>
            <category><![CDATA[multimodal-ai]]></category>
            <category><![CDATA[artificial-intelligence]]></category>
            <category><![CDATA[text-to-speech]]></category>
            <category><![CDATA[agentic-ai]]></category>
            <dc:creator><![CDATA[Herman Wandabwa]]></dc:creator>
            <pubDate>Mon, 31 Mar 2025 09:19:13 GMT</pubDate>
            <atom:updated>2026-04-19T02:32:13.151Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*rt90eMXYAXYEwWBVp1JtZg.png" /><figcaption>Image generated by the author</figcaption></figure><p>I was talking to one of my friends the other day about the advancements in <strong>T</strong><a href="https://www.ibm.com/think/topics/text-to-speech"><strong>ext To Speech</strong></a><strong>(TTS)</strong> and <a href="https://www.ibm.com/think/topics/speech-to-text"><strong>Speech To Text(STT)</strong></a> systems in the ever-evolving AI world. I realized this is one area that I haven’t really explored much. For those who may not be aware, TTS is just the technology that helps convert text on a digital interface into natural-sounding audio.</p><p>I’ve in the past written about <a href="https://medium.com/data-science/capacity-optimization-in-freight-trains-part-1-4918f35a6433">optimisation</a>, <a href="https://medium.com/data-science-collective/building-a-rag-system-with-mmr-for-safaricoms-smart-assistant-1e9ba91b9bfe">Retrieval-Augmented Generation (RAG)</a>, <a href="https://hermanwandabwa.medium.com/optimizing-equipment-maintenance-planning-with-deepseek-reasoning-llm-and-agents-crewai-a063114f8bb6">agents, </a>as well as <a href="https://medium.com/swlh/6-kgs-lost-in-31-days-of-covid-19-lockdown-a-data-analytics-perspective-a0061e0689f2">data analytics-</a>driven stories. Therefore, I wanted to look at something slightly different, and this led me to TTS advancements and especially their interplay with <a href="https://blogs.nvidia.com/blog/what-is-agentic-ai/">agentic AI</a>. <em>How could agentic AI work be combined with TTS?</em></p><p>After much thought, I settled on coming up with a customized <strong>Agenti AI-powered safari guide</strong> application. Local safari guides are usually great, but as humans, we tend to repeat a lot of what we’ve memorized over time. Therefore, this resulted in the thought of developing an AI-driven tour guide. Just think of a local guide that you meet in a museum, gallery, or safari but one who has a massive knowledge base that’s driven by you. All you need to do is prompt the AI tour guide, and you get very relevant and up-to-the-minute audio tour information. This could include updates fused with live weather data that’s specific to the location. Enough of stories. Let&#39;s go to the interesting parts.</p><h3>1. Agentic-AI Powered Architecture</h3><p>As mentioned in the introduction, agents were my choice in developing this application. For those who may not be aware, <a href="https://aws.amazon.com/what-is/ai-agents/">agents</a> are mini computer programs that can interact with the environment to collect data and perform certain tasks to meet a predetermined goal. A good example is, say, a robot barista that’s been instructed to serve the perfect cup of coffee. It doesn’t just follow a script. It checks the weather (maybe you’d prefer something iced on a hot day) and adjusts the brew. If you ask for something unusual — say, a lavender oat milk cortado with half the sweetness — it searches its recipes, adapts, and serves it up. If it doesn’t meet your requirements, then it can ask you for help/suggestions on the alternatives. That’s how AI agents work: they’re not just following instructions, but they think through the process of reaching the goal, step by step, with whatever tools and info they’ve got.</p><p>This app is orchestrated around a core controller called SafariManager. This class in itself doesn’t generate any content but delegates the tasks to the domain-specific agents. The data generation process starts with a user inputting their safari preferences based on the location they are visiting. This then forms the basis of the extracts from the agents. The primary agents work as follows, and this is in the context of a tour. For example, imagine you are at the <a href="https://www.maasaimara.com/">Maasai Mara National Park</a> in Kenya; the agents would pull information as follows:</p><ol><li><strong>Biodiversity agent</strong> — describes the local Mara ecosystems, historical wildlife data, and other engaging ecological narratives.</li><li><strong>Environmental agent</strong> — provides up-to-date environmental conditions, including highlights on the sustainable practices and local environmental challenges in the Mara area.</li><li><strong>Culture agent</strong> — describes in detail the cultural narratives, including local Maasai traditions, keeping the conversation warm and respectful.</li><li><strong>Safety agent</strong> — this is where safety guidelines, including navigation tips, weather alerts, and general safety measures, are curated. It could include safety tips when navigating, say, muddy roads in the park. This agent will do a web search to get the up-to-date safety data.</li><li><strong>Planner Agent</strong> — this is a vital agent as it plans for the audio tour. It does this by analysing the user’s location, interests, and tour duration to determine the optimal time allocations for each of the specialist agents above. In addition, introduction and conclusion section timing allocations are also added here.</li><li><strong>Orchestrator</strong> — this agent stitches together the content from the other agents into a natural, flowing narrative. It adds an introduction and a brief conclusion.</li></ol><p>Remember each agent is prompted with structured instructions and given a specific role. The following diagram visualizes this simplified flow:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*qjaLSV3iY1K7sLpOy5ks8g.png" /><figcaption>Pipeline for the whole framework</figcaption></figure><h3>2. User Input to Agent Execution</h3><p>I used a <a href="https://streamlit.io/">Streamlit </a>interface to capture most inputs from users. It&#39;s a simple and Python-native framework that makes it easy to integrate modules such as the ones in the above architecture. The following details are captured in the interface:</p><ul><li>Safari location (e.g., Amboseli, Tsavo, Maasai Mara ((for my case))</li><li>Topics of interest (biodiversity, environment, etc., representative of the agents)</li><li>Preferred language (English or Swahili)</li><li>Duration of the tour (in minutes).</li><li>Guide voice tone (e.g., friendly , professional, etc.)</li></ul><p>The interface looks like the below. Remember to input your OpenAI key for the generation to happen. However, this can be customised for open-source models.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*LHZeli4WTnQlVqb65-mKkQ.png" /></figure><p>Once all fields are set and the user hits the “Generate Safari Tour” button, the backend calls this code:</p><pre>final_tour = run_async(<br>    mgr.run, location, interests, duration, st.session_state.get(&quot;LANGUAGE&quot;)<br>)</pre><p>In the above code, <em>mgr </em>is a variable assigned to the classSafariManager . Triggering mgr.run that generates an OpenAI trace ID for observability. The planner agent is first invoked to allocate minutes across sections based on word counts. The speech rate is set at 150 words per minute. You can play around with words per minute if you want the speech a bit slow or fast.</p><p>Each agent is then triggered using the runner.runabstraction. All prompts include the language, location, tone guidance, and word limit choices. Their outputs are then assembled by the orchestrator agent whose job, as mentioned above, is to put together the final narrative with smooth transitions between the sections. The output is quite detailed and interesting. A sample run output in English is captured below:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*sTG_aqWYGk9UykJSPPCfwg.png" /><figcaption>5-minute narrative in English</figcaption></figure><p>The Swahili narrative below is also actually good. I have actually learnt a number of new Swahili words in this output.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*2BBF5CivL16_m09Ewu3VZg.png" /><figcaption>5-minute narrative in Swahili</figcaption></figure><h3>3. Generating the Audio Tour</h3><p>The last part of the setup is to convert the above outputs to audio and that’s where TTS comes in. The below tts()function handles this part. I used <a href="https://platform.openai.com/docs/guides/text-to-speech">OpenAI&#39;s speech endpoint</a> for this conversion. The endpoint requires specification of the model to be used (gpt-4o-mini-tts) in this case, the text to be turned into audio (generated output), and the voice to speak the output (<a href="https://www.openai.fm/">nova</a>). All these configuration options can be found <a href="https://platform.openai.com/docs/api-reference/audio/createTranscription">here</a>.</p><pre>response = client.audio.speech.create(<br>    model=&quot;gpt-4o-mini-tts&quot;,<br>    voice=&quot;nova&quot;,<br>    input=text,<br>    instructions=&quot;&quot;&quot;<br>    You are a friendly, knowledgeable safari guide. Speak naturally and conversationally...<br>    &quot;&quot;&quot;<br>)<br>response.stream_to_file(speech_file_path)</pre><p>You can download and listen to the generated audio I generated <a href="https://github.com/wandabwa2004/AI_Safari_Guide/tree/dev/outputs"><strong>here</strong></a>. They are in Swahili and English. Let me know whether the audio quality matches your expectations in the comments. That’s actually it for today.</p><h3>4. Future Directions</h3><p>The above pipeline demonstrates how practical agentic AI and TTS systems could be built. Beyond safaris, similar setups could be used in <em>museum walkthroughs, classroom storytelling</em>, and <em>cultural heritage apps.</em></p><p>The fact that there are independent agents means that you can add new modules or simply update the current ones without entirely changing the whole app. A few notable improvements could be made here. I actually suggest you look at this in your free time:</p><ul><li><strong>Human-in-the-loop feedback: </strong>Here, you can<strong> </strong>enable users to refine the agentic outputs before final assembly. This way, users will be able to get very personalised outputs.</li><li><strong>Personalization memory</strong>: This is a feature that could help remember user preferences across sessions or based on their tour history, thus hyper-personalised outputs.</li><li><strong>Multimodal input</strong>: This is ambitious, but letting users upload images of landscapes or animals around them and receive guided commentary will be golden. Let me know in the comments whether you’d like such an article.</li></ul><h3>5. Conclusion</h3><p>AI Safari Guide is more than just a cool TTS and agentic AI demo — it’s a fully functional and extensible AI system. The app combines conversational generation and audio synthesis to deliver a personalised audio guide to users and is specific for Maasai Mara. However, this can be adapted for any other location if need be.</p><p>As usual, my code is open sourced in this <a href="https://github.com/wandabwa2004/AI_Safari_Guide/tree/dev"><strong>repo</strong></a>. Clone it, set up your environment, and install the required packages in the file requirements.txt.</p><p>If you enjoyed this write-up or found it useful, please <strong>leave a clap</strong>, <strong>follow me</strong>, or <strong>drop a comment</strong> with your thoughts and feedback. If you want to check out my other articles:</p><ul><li><a href="https://hermanwandabwa.medium.com/uncovering-patterns-and-trends-in-ausgrid-power-outage-data-ec538d4f70f9">Uncovering Patterns and Trends in Ausgrid Power Outage Data</a></li><li><a href="https://medium.com/data-science-collective/from-data-to-dialogue-development-of-a-retrieval-augmented-generation-rag-chatbot-for-fitness-b9fbaf818ace">How to Create an Entire RAG System as a Newbie</a></li><li><a href="https://medium.com/data-science-collective/building-a-rag-system-with-mmr-for-safaricoms-smart-assistant-1e9ba91b9bfe">Building a RAG System with MMR for Safaricom’s Smart Assistant</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=343a7f354ff8" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-science-collective/creating-a-personalized-safari-guide-with-agentic-ai-and-text-to-speech-tts-343a7f354ff8">Creating a Personalized Safari Guide with Agentic AI and Text-to-Speech (TTS)</a> was originally published in <a href="https://medium.com/data-science-collective">Data Science Collective</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
    </channel>
</rss>