In a typical Customer Relationship Management (CRM) system, managing leads, contacts, and organizations across different files can become chaotic without proper integration. By automating this process using Python and Excel, we can merge data from three CSV files—representing boxes, contacts, and organizations—into a comprehensive dataset that brings together all the relevant information. This article explains how to use the code you’ve provided to achieve this integration.
Problem Overview
When working with CRM data, you often have separate CSV files for different entities. In this case, you have three main data files:
- Contacts.csv: Contains information about individual leads and their details, such as name, email, and company affiliation.
- Boxes.csv: Contains metadata about the organizations, such as their address, size, and type.
- Org.csv: Contains additional details for the organizations such as contact information, financials, and related boxes.
The challenge is to merge this data into a single Excel file where information is combined for easy access and analysis.
Approach
We use the xlwings library to interact with Excel files, pulling data from the three spreadsheets, performing transformations, and merging the datasets. The end result will be a new Excel file where the data is integrated across contacts, boxes, and organizations.
Code Walkthrough
1. Loading Data
The first task is to load data from three separate Excel files: Contacts.xlsx
, Boxes.xlsx
, and Org.xlsx
. Using the xlwings
library, the code reads data from these files into lists for further processing.
pythonCopy codeimport xlwings as xw
import os
import sys
import time
# Get current script directory to load files
path = os.path.abspath(os.path.dirname(sys.argv[0]))
# Load Contacts
fn = os.path.join(path, "Contacts.xlsx")
wb = xw.Book(fn)
ws = wb.sheets[0]
inpContact = ws.range("A2:M10000").value
inpContact = [x for x in inpContact if x[0] != None]
This block of code loads the Contacts sheet and extracts values into a list, filtering out rows with None
values. Similar logic is applied for loading Boxes and Organizations data:
pythonCopy code# Load Boxes
fn = os.path.join(path, "Boxes.xlsx")
wb = xw.Book(fn)
ws = wb.sheets[0]
inpBoxes = ws.range("A2:T10000").value
inpBoxes = [x for x in inpBoxes if x[0] != None]
dictBoxes = {x[0]: x for x in inpBoxes}
# Load Organizations
fn = os.path.join(path, "Org.xlsx")
wb = xw.Book(fn)
ws = wb.sheets[0]
inpOrg = ws.range("A2:L10000").value
inpOrg = [x for x in inpOrg if x[0] != None]
dictOrg = {x[1]: x for x in inpOrg}
2. Combining Data
Once the data is loaded, the next step is merging it. For each contact in inpContact
, we match corresponding data from dictBoxes
(boxes data) and dictOrg
(organization data).
The following loop adds information from boxes and organizations to the contact data:
pythonCopy codeoutData = []
finishedBox = []
finishedOrgBox = []
for row in inpContact:
workRow = row[:]
workBox = row[1]
workEMail = row[5]
workRow.append(workBox) # Add box information
# Add Box data if available
if workBox in dictBoxes:
workRow.extend(dictBoxes[workBox])
finishedBox.append(workBox)
else:
workRow.extend([None for x in range(20)])
# Add Org data if available
if workBox in dictOrg:
workRow.append(dictOrg[workBox][0])
workRow.append(dictOrg[workBox][4])
outData.append(workRow)
Here, we iterate over each contact and extend the row with corresponding box and organization data where available. If a match isn’t found, the row is padded with None
values to ensure data consistency.
3. Handling Unmatched Data
After matching the main entities, the script ensures that any boxes or organizations not already associated with a contact are added to the final output.
pythonCopy codefor row in inpBoxes:
if row[0] not in finishedBox:
workRow = [None for x in range(36)]
# Add unmatched Boxes data
outData.append(workRow)
for row in inpOrg:
if row[1] not in finishedOrgBox:
workRow = [None for x in range(36)]
# Add unmatched Org data
outData.append(workRow)
4. Writing to Output File
Once all the data is processed, it’s written to a new Excel file (Out.xlsx
). The following code clears previous data in the output sheet and writes the merged result:
pythonCopy codefn = os.path.join(path, "Out.xlsx")
wb = xw.Book(fn)
ws = wb.sheets[0]
ws.range("A2:AZ10000").value = None
ws.range("A2:AZ10000").value = outData
Conclusion
This solution automates the process of merging contacts, boxes, and organizations into one consolidated Excel sheet, saving manual effort and providing a clear view of all relevant data. By leveraging Python and xlwings
, we create a dynamic and scalable solution that can easily be expanded to handle additional CRM data sources.
This approach not only makes the data more accessible but also highlights the importance of organizing CRM data for more streamlined decision-making and analysis