Free Python Code and Instructions
for New Product Sales Forecasting
The free version and sample data are available for download as a ZIP file, and users can start using them right away. Although we explain everything step by step, we recommend that users have a basic understanding of Python and Jupyter Notebook. One of the main advantages of the End-to-End solution is that non-technical users do not need to follow a complex process manually, as everything is handled automatically.
1. Jupyter Notebook Setup: The easiest way to run the notebook is with Anaconda, as it installs Python, Jupyter Notebook, and common data analysis packages together.
Useful links:
- Download Anaconda:
https://www.anaconda.com/download
- Anaconda installation guide:
https://www.anaconda.com/docs/getting-started/anaconda/install
- Jupyter Notebook installation/help:
https://jupyter.org/install
If you already use Python, install the required packages by executing below command in bash:
pip install -r requirements.txt
2. How to Run Jupyter Notebook:
A. Click the button above to download and unzip `Forecasting_package.zip`.
B. Install Anaconda if Python/Jupyter Notebook is NOT already installed.
C. Open Anaconda Navigator.
D. Click 'Launch' under Jupyter Notebook.
E. In the Jupyter browser window, open the unzipped folder.
F. Open `Forecasting_CBC_Workbook.ipynb`.
G. Run the notebook Cells 1 and 2.
NOTE: The current code uses sample data and the provided template. Please follow the instructions below to prepare the required data for your own use case.
3. Defining Features: The first step is to define and break down all possible features of a product or service. In this case study, we use the example of the NOTHING Technology seeking to forecast their new mobile device sales in US-please see the 'Feature_template.csv' in the folder. As shown below, these features should be structured in a comma-separated .csv file. The top row lists the feature names, such as Brand, while the rows underneath define the possible levels or specifications, such as Apple, Samsung, Google, Nothing and Others. Each feature can have a different number of levels. For example, a feature such as Folding may include just two levels: Yes and No.
4. Experiment Design: Based on above example features, the full combination of scenario for the experiment i.e. full factorial will be 5 x 4^4 x 3 x 2 = 7,680, which is impossible to use in a real choice based conjoint experiment; therefore we need to create an optimal combination in order to capture the required data while keep the size of experiment reasonable. We use method called 'D-optimal fractional factorial design', it generate 60 combination and scenarios. We have a python code that takes the information and return the optimal design; however, you need to give the variables specifications in csv file format.
To set up the experiment, each feature is defined as a separate row in the CSV file 'Feature_specs.csv'. The file includes five key columns: Features, Feature_Type, Level Number, Minimum, and Maximum.
Features are the variables you want to test and optimise e.g. Brand, Price, Screen Size, Camera, Storage, Folding, and Battery Life.
Feature Type defines the kind of variable being tested:
-
Continuous features can take values across a range. For example, Price ($) is defined as a continuous feature with 4 levels between 300 and 1300.
-
Discrete features can only take specific numeric values. For example, Storage (GB) is defined as a discrete feature with 4 levels between 128 and 1024.
-
Categorical features represent named options or service configurations rather than numeric values e.g. Brand, and Folding.
Level Number specifies how many variations of each feature should be tested in the experiment. This determines how many settings will be generated for each feature. For example, Price ($) has 4 levels, while Folding has 2 levels.
For numeric features (Continuous or Discrete), the Minimum and Maximum columns define the range to be tested. The experiment setup then uses the selected number of levels to generate values across that range. For example:
-
Price ($): 4 levels from 300 to 1300
-
Camera Pixel (MP): 4 levels from 20 to 200
-
Battery Life (days): 3 levels from 1 to 5
For categorical features, the Minimum and Maximum fields are left blank, because these features are defined by categories rather than numeric ranges.
5. Optimal Design: When you run the Python code in Cell 3, it returns an optimal design CSV file. In the healthcare provider example, this file contains 60 feature combinations. Please see 'fractional_factorial_design.csv' in the folder.
6. Data Collection: The features above should be presented to customers in a multiple-choice format. This can be done using either a paper-based questionnaire or an online survey platform such as SurveyMonkey or Qualtrics. Below is the AI Insights for Success End-to-End solution for paid members — COMING SOON!
7. Data Structure: If you are not using the AI Insights for Success End-to-End solution for paid members, which automates the full workflow, you will need to prepare the data in a format that can be read by our Python package. For example, Participant_id 1 selected run_id_3 from the first set of four choices. In this case, run_id_3 is coded as 1, while run_id_1, run_id_2, and run_id_4 are coded as 0 because they were not selected. If Participant_ID 3 selected none, then run_id_1, run_id_2, run_id_3, and run_id_4 would all be coded as 0. Please see the synthetic healthcare provider example in 'Survey_response.csv'.
8. Choice-Based Conjoint Analysis: Once the file is ready and placed in the folder, you can run Cells 4 and 5 to view the influential features, and visualisations.
The synthetic data example below shows that the Brand is the most important feature, and that 6.5 inch screen, 200MP camera, 512 GB storage are most influential levels within features.
9. New Product Sales Probability: This section helps predict the purchase probability for new product scenarios, including combinations that were not shown in the original fractional factorial design. For example, your original survey may not have included a phone with the following features:
-
Brand: Nothing
-
Price: $700
-
Screen Size: 6.7 inch
-
Camera: 48 MP
-
Storage: 512 GB
-
Folding: No
-
Battery Life: 3 days
The model can still estimate the probability of purchase for this new combination by using the feature-level effects learned from the survey responses. This output is a model-based probability score for each product profile. A total market sales forecast or revenue forecast can also be calculated if the market size for a given period is known.
Before running Cell 6 of the Jupyter Notebook, the new product specifications must be defined in CSV format in 'Forecasting_Model_Spec.csv'. Please use the template shown below.
The predicted probability shows the relative likelihood that respondents may choose or purchase each product scenario, based on the survey data and the model. A higher probability means that the scenario is more attractive according to the model. For example, if the output shows:
-
Scenario 1: 3.37%
-
Scenario 2: 4.48%
-
Scenario 3: 2.70%
Then Scenario 2 has the highest predicted purchase probability among the tested scenarios.
This does not mean that exactly 4.48% of the entire market will buy the product. It means that, based on the survey responses and the feature-level effects learned by the conjoint model, Scenario 2 is predicted to perform better than the other scenarios tested.