Free Python Code and Instructions
for New Product Development
As outlined on the New Product Development page, there are two approaches to experimental design and data collection: primary data and secondary data. Our main focus here is the primary approach, while also highlighting selected considerations relevant to the secondary approach.
The free version and sample data are available as a ZIP download, allowing users to get started immediately. Although we explain the process step by step, we recommend that users have a basic understanding of Python and Jupyter Notebook. One of the main advantages of the End-to-End solution is that non-technical users do not need to manage the process manually, as everything is handled automatically.
1. Jupyter Notebook Setup: The easiest way to run the notebook is with Anaconda, as it installs Python, Jupyter Notebook, and common data analysis packages together.
Useful links:
- Download Anaconda:
https://www.anaconda.com/download
- Anaconda installation guide:
https://www.anaconda.com/docs/getting-started/anaconda/install
- Jupyter Notebook installation/help:
https://jupyter.org/install
If you already use Python, install the required packages by executing below command in bash:
pip install -r requirements.txt
2. How to Run Jupyter Notebook:
A. Click the button above to download and unzip `website_download_package.zip`.
B. Install Anaconda if Python/Jupyter Notebook is NOT already installed.
C. Open Anaconda Navigator.
D. Click 'Launch' under Jupyter Notebook.
E. In the Jupyter browser window, open the unzipped folder.
F. Open `Choice_Based_Conjoint_Analysis_Workbook.ipynb`.
G. Run the notebook Cells 1 and 2.
NOTE: The current code uses sample data and the provided template. Please follow the instructions below to prepare the required data for your own use case.
3. Defining Features: The first step is to identify and break down all possible features of a product or service, regardless of whether you are using primary or secondary data. In this case study, we use the example of a US outpatient healthcare provider seeking to identify the combination of features that maximises customer utility. Please see 'Feature_template.csv' in the folder. As shown below, these features should be organised in a comma-separated CSV file. The top row contains the feature names, such as Monthly membership fee, while the rows below define the possible levels or specifications, such as $200, $130, $80, and $50. Each feature can have a different number of levels. For instance, Care provider facility may include just two levels: Hospital and Local clinic.
4. Experiment Design: Based on the example features above, the full set of possible scenarios for the experiment, that is, the full factorial design, would be 4 × 3^4 × 2 = 648. This is almost impossible to use in a real choice-based conjoint experiment; therefore we need to create an optimal combination in order to capture the required data while keep the size of experiment reasonable. We use method called 'D-optimal fractional factorial design', it generate 36 combination and scenarios. We have a python code that takes the information and return the optimal design; however, you need to give the variables specifications in csv file format.
To set up the design, each feature is defined as a separate row in the CSV file 'Feature_specs.csv'. The file includes five key columns: Features, Feature_Type, Level Number, Minimum, and Maximum.
Features are the variables you want to test and optimise e.g. Monthly membership fee, Time to next available appointment, Out-of-pocket cost per visit, Care coordination & follow-up, Visit access options, and Care Provider Facility.
Feature Type defines the kind of variable being tested:
-
Continuous features can take values across a range. For example, Monthly membership fee ($) is defined as a continuous feature with 4 levels between 50 and 200.
-
Discrete features can only take specific numeric values. For example, Time to next available appointment (days) is defined as a discrete feature with 3 levels between 3 and 14.
-
Categorical features represent named options or service configurations rather than numeric values e.g. Care coordination & follow-up, Visit access options, and Care Provider Facility.
Level Number specifies how many variations of each feature should be tested in the experiment. This determines how many settings will be generated for each feature. For example, Out-of-pocket cost per visit ($) has 3 levels, while Care Provider Facility has 2 levels.
For numeric features (Continuous or Discrete), the Minimum and Maximum columns define the range to be tested. The experiment setup then uses the selected number of levels to generate values across that range. For example:
-
Monthly membership fee ($): 4 levels from 50 to 200
-
Time to next available appointment (days): 3 levels from 3 to 14
-
Out-of-pocket cost per visit ($): 3 levels from 25 to 150
For categorical features, the Minimum and Maximum fields are left blank, because these features are defined by categories rather than numeric ranges.
5. Optimal Design: When you run the Python code in Cell 3, it returns an optimal design CSV file. In the healthcare provider example, this file contains 36 feature combinations. Please see 'fractional_factorial_design.csv' in the folder.
6. Data Collection: If you are using secondary data, such as data on choices made by users on a website, please go to the Data Structure section.
If you are collecting primary data, the features above should be presented to customers in a multiple-choice format. This may be done using either a paper-based questionnaire or an online survey platform such as SurveyMonkey or Qualtrics. Below is the sample of AI Insights for Success End-to-End solution for paid members — COMING SOON!
7. Data Structure: If you are not using the AI Insights for Success End-to-End solution for paid members, which automates the full workflow, you will need to prepare the data in a format that can be read by our Python package. For example, Participant_id 1 selected run_id_3 from the first set of four choices. In this case, run_id_3 is coded as 1, while run_id_1, run_id_2, and run_id_4 are coded as 0 because they were not selected. If Participant_ID 3 selected none, then run_id_1, run_id_2, run_id_3, and run_id_4 would all be coded as 0. Please see the synthetic healthcare provider example in 'Survey_response.csv'.
8. Choice-Based Conjoint Analysis: Once the file is ready and placed in the folder, you can run Cells 4 and 5 to view the optimal design, influential features, and visualisations.
The synthetic data example below shows that the time to next available appointment is the most important feature, and that longer wait times have a negative effect on customers. In addition, customers prefer to pay a higher monthly fee rather than incur higher out-of-pocket costs.