Free Python Code and Instructions
for Product Recommendation Engine
The free version and sample data are available for download as a ZIP file, and users can start using them right away. Although we explain everything step by step, we recommend that users have a basic understanding of Python and Jupyter Notebook. One of the main advantages of the End-to-End solution is that non-technical users do not need to follow a complex process manually, as everything is handled automatically.
1. Jupyter Notebook Setup: The easiest way to run the notebook is with Anaconda, as it installs Python, Jupyter Notebook, and common data analysis packages together.
Useful links:
- Download Anaconda:
https://www.anaconda.com/download
- Anaconda installation guide:
https://www.anaconda.com/docs/getting-started/anaconda/install
- Jupyter Notebook installation/help:
https://jupyter.org/install
If you already use Python, install the required packages by executing below command in bash:
pip install -r requirements.txt
2. How to Run Jupyter Notebook:
A. Click the button above to download and unzip `Product_Recommendation_Engine_Package.zip`.
B. Install Anaconda if Python/Jupyter Notebook is NOT already installed.
C. Open Anaconda Navigator.
D. Click 'Launch' under Jupyter Notebook.
E. In the Jupyter browser window, open the unzipped folder.
F. Open `Product_Recommendation_Engine_Workbook.ipynb`.
G. Run the notebook Cells 1 and 2.
NOTE: The current code uses sample data and the provided template. Please follow the instructions below to prepare the required data for your own use case.
3. Products Features: The first step is to define and break down products features. In this case study, we use the example of the an Online shop seeking to build a Product recommendation engine in US-please see the 'Product_catalog.csv' in the folder. As shown below, these features should be structured in a comma-separated .csv file. The top row lists the feature names, such as Item_id, Product_name, Category, Price, Margin_band, Average_rating, Stock_status, Product_age_days. The rest of rows show specific data.
4. Customers Features: The second step is to define and break down customers features. In this case study, customer features of the online shop should be structured in a comma-separated .csv file-please see the 'Customer_segments.csv' in the folder. The top row lists the feature names, such as Item_id, Product_name, Category, Price, Margin_band, Average_rating, Stock_status, Product_age_days. The rest of rows show specific data.
5. Customer and Product Interactions: The third dataset contains customer-product interactions, which are the main data points capturing how customers interact with specific products on the website. In the online shop example, each row represents one customer-product event, such as a view, click, add to cart, purchase, or review. Please see 'customer_product_interactions.csv'. The dataset includes the following features:
-
User_ID is the unique customer identifier.
-
Item_ID is the unique product identifier.
-
Event_Type is the type of event, such as View, Add an item to a Cart, or Purchase.
-
Event_Strength is the interaction strength score based on business context. For example, View = 1, Add to Cart = 4, and Purchase = 8.
-
Quantity is the number of times a given event occurs within a website visit i.e. within a session_id.
-
Unit_Price is the price of each product unit.
-
Interaction_Datetime is the date and time of the interaction.
-
Session_ID is the unique website visit identifier for a given customer.
Cell 3 of the notebook load the csv files datasets. Cell 4 and 5 do the data manipulation and prep.
6. Model built: Cell 6, 7, 8, and 9 build the model based on the data and creates top product recommendations for the customers. The model filters out products the customer has already interacted with in the training data.
7. Find Top 10 Similar Products: In cell 10, the model identifies products that are often preferred by customers with similar interests. These products are therefore more likely to appear in the same basket. For example, the table below shows products similar to ITEM001, ranked by similarity score. The Similarity Score is a value between 0 and 1, where 0 indicates the lowest similarity and 1 indicates the highest.
As expected, ITEM001 has the highest similarity with itself, with a score of 1. In second and third place, we have ITEM059 and ITEM011, with similarity scores of 0.35 and 0.30, respectively. This suggests that customers who are interested in buying ITEM001, that is, Stainless Saucepan, may also be interested in ITEM059, that is, Travel Umbrella, and ITEM011, that is, Cotton Towel Set.
Note: Please note that the data is synthetic.
8. Category Mix: The below chart shows category mix of the sample recommendations.
9. Output and Integration with Website: The model should be refreshed with new data based on business requirements, data availability, and the type of business. Depending on the use case, it may be updated daily, weekly, or monthly. The file 'aggregated_user_item_interactions.csv' is used as part of the recommendation workflow for a given customer. Ideally, the entire process should be automated.
COMING SOON!! The End-to-End solution will be a fully automated system. It will integrate with a website, pull fresh data every night, and retrain the model using the latest available data. The model will then rank products for each customer and push the recommendations back to the website. Based on business requirements and website specifications, the solution can be configured for different update schedules and deployment needs. There will also be a range of algorithms available to determine which one best fits the data and the use case.