Automating List Comparison with Python: A Fuzzy Matching Solution for Client Conversion

Automate the Boring Tasks, So You Can Do What Really Matters... TikTok!

Automating List Comparison with Python: A Fuzzy Matching Solution for Client Conversion

Introduction: The Problem at Work

Not long ago, I found myself facing a familiar challenge at work—comparing two client lists to identify those potential clients who had not been converted into actual clients yet. This was the second time I had to solve this problem, and I realized it was one of those repetitive tasks that needed automation. I decided that if I’m going to keep facing this problem, it’s best to create a reusable solution that I can run again in the future without breaking a sweat.

The task was straightforward: I had two lists, one containing clients and another containing potential clients. What I needed to do was figure out who from the potential clients list had not yet been converted to clients, but there was a catch. The clients list wasn’t exhaustive. Some of the clients were missing from the potential clients list altogether. On top of that, the data was far from perfect. Client names were sometimes misspelled or varied in format. This meant that a simple comparison between the two lists wouldn’t be sufficient.

After dealing with this challenge before, I decided to automate the task, making sure I never had to go through this headache again.

Choosing the Right Tools

I usually write my scripts in JavaScript, but for this task, I knew that Python would be the best tool for the job. Why? Because Python is widely recognized for automation tasks, and libraries like pandas and fuzzywuzzy are perfect for working with messy data. I’ve always heard great things about Python’s ability to handle data manipulation efficiently, and this felt like the right moment to give it a try.

Before diving into the script, I made sure that my Python environment was set up correctly. It turned out everything was already in place, which was a relief. I just had to confirm that I had Python installed, and I was good to go.

Discovering the Fuzzywuzzy Library

I’ve come across fuzzy matching in my JavaScript endeavors before, so when I found out that Python had the fuzzywuzzy library, I knew the task just became a whole lot easier. The fuzzywuzzy library helps with comparing strings in a way that accounts for small differences—whether those are typos, alternative spellings, or other minor inconsistencies.

Once I installed the library with pip install fuzzywuzzy, I could immediately start exploring how it works. This was exciting because it meant I didn’t have to reinvent the wheel. Instead, I could use a proven tool to solve my problem efficiently. It felt like the pieces were falling into place.

The Solution: Code Walkthrough

Now that I had the right tools, it was time to dive into the solution. The problem was simple on the surface: compare the client list with the potential client list to see who hadn’t been converted yet. However, there were a few challenges: the data wasn’t perfect, and there were minor differences between the names, phone numbers, and emails across the lists. Thankfully, fuzzy matching provided the perfect solution for handling those discrepancies.

Understanding the Task

The goal was to compare the clients list to the potential clients list to find out which potential clients were not converted into clients, considering slight variations in their names, phone numbers, or emails.

Code Explanation

Let’s break down how I tackled the problem using Python and the fuzzywuzzy library.

Step 1: Importing Required Libraries

First, I imported the necessary libraries: fuzz and process from fuzzywuzzy. These tools would handle the actual comparison of strings.

from fuzzywuzzy import fuzz, process

Step 2: Define the Matching Function

Next, I wrote a function to compare the names, phone numbers, and emails in the two lists. If any field from the potential clients list matched a field from the clients list above a certain threshold (say, 85%), I considered it a match.

def get_matching_entries(list1, list2, threshold=85):
    names_list2 = [entry["Name"] for entry in list2]
    phones_list2 = [entry["Phone"] for entry in list2]
    emails_list2 = [entry["Email"] for entry in list2]

    matching_list = []

    for entry in list1:
        name = entry["Name"]
        phone = entry["Phone"]
        email = entry["Email"]

        name_match, name_score = process.extractOne(name, names_list2, scorer=fuzz.ratio)
        phone_match, phone_score = process.extractOne(phone, phones_list2, scorer=fuzz.ratio)
        email_match, email_score = process.extractOne(email, emails_list2, scorer=fuzz.ratio)

        if name_score >= threshold or phone_score >= threshold or email_score >= threshold:
            matching_list.append(entry)

    return matching_list

In the code above, for each entry in the clients list (list1), I compare its name, phone, and email to the corresponding entries in the potential clients list (list2). The process.extractOne function finds the best match for each field and returns the match with the highest score. If the match score is above the threshold (85% or higher), I consider it a valid match and add it to the matching_list.

Step 3: Example Data & Execution

Here, I created an example dataset using Kenyan names, phone numbers, and emails. I included variations between the two lists to simulate real-world discrepancies that you might find in data.

list1 = [
    {"Name": "Kipkemoi Ruto", "Phone": "0701234567", "Email": "kipkemoi.ruto@example.com"},
    {"Name": "Wanjiru Njeri", "Phone": "0722345678", "Email": "wanjiru.njeri@example.com"},
    {"Name": "Muthoni Kamau", "Phone": "0733456789", "Email": "muthoni.kamau@example.com"},
    {"Name": "Samantha Otieno", "Phone": "0745567890", "Email": "samantha.otieno@example.com"}
]

list2 = [
    {"Name": "Kipkemoi Rutto", "Phone": "0701234567", "Email": "kipkemoi.ruto@example.com"},
    {"Name": "Wanjira Njeri", "Phone": "0722345678", "Email": "wanjiru.njeri@example.com"},
    {"Name": "Muthoni Kamou", "Phone": "0733456789", "Email": "muthoni.kamau@example.com"},
    {"Name": "Samatha Otieno", "Phone": "0745567890", "Email": "samantha.otieno@example.com"}
]

matching_list = get_matching_entries(list1, list2)

In this example:

  • "Kipkemoi Ruto" vs "Kipkemoi Rutto" (minor typo)

  • "Wanjiru Njeri" vs "Wanjira Njeri" (minor typo)

  • "Muthoni Kamau" vs "Muthoni Kamou" (minor typo)

  • "Samantha Otieno" vs "Samatha Otieno" (misspelling)

The matching function should identify these as valid matches even though the names aren’t exact matches.

Step 4: Result Interpretation

The result was a list of entries from the clients list that matched the potential clients list based on name, phone, or email, making it easy to see who had been converted. Here’s the output:

print(f"Number of matching entries: {len(matching_list)}")
for entry in matching_list:
    print(entry)

This helped me quickly identify which potential clients had already become clients. It was like magic—no manual comparison of names or phone numbers was needed anymore.

Conclusion: The Power of Automation

Programming is powerful and fun when you’re able to automate tasks that would otherwise be repetitive and time-consuming. I can now run this script any time I face a similar problem, and it saves me hours of work. This approach is so efficient that I will never have to endure this pain again.

The best part? The script is reusable and can be adapted for future tasks. Whether you’re comparing names, emails, or phone numbers across lists, fuzzy matching makes it so much easier to handle inconsistencies and variations in data. If you’re facing a similar challenge, this approach can save you both time and effort.

For me, this experience has reinforced the idea that programming is a tool that empowers you to solve problems smarter, not harder.

You can check out the full solution and code for this project in my GitHub repository. Feel free to explore, clone, and adapt it for your own needs!