Monte Carlo Simulation with Python

Kevin Barreiro
6 min readDec 10, 2021

Introduction

There are occasions where an individual or group may desire to analyze the risk associated with a particular endeavor. Monte Carlo simulation plays a crucial role in such a process and facilitates the assessment of risk; the simulation involves generating random values for the uncertain inputs of a model, thus simulating values for the output variable of interest. The output variable is simulated many times. Thereafter, the simulations for the output variable are analyzed with statistical tools to make inferences about the situation.

For the sake of this post, we will implement a model to simulate net revenue for a hotel by using Monte Carlo simulation with any uncertain variables in Python. Begin by importing the necessary libraries and setting the Seaborn library theme.

import numpy as np
import pandas as pd
import scipy.stats as st
import matplotlib.pyplot as plt
import seaborn as sns
import math
sns.set_theme(style="ticks", palette="ch:s=.25,rot=-.25")

Model

NR = MIN{CA, RA} * RP - OS - OC
CA = RM - CN
OS = MAX{0, CA - RA}

where,
NR = net revenue
CA = customer arrivals
RA = rooms available
RP = room price
OS = overbooked customers
OC = overbookng cost
RM = reservations made
RL = reservations limit
CD = customer demand
CN = cancellations

The variable of interest is net revenue, namely the model’s output.

Known Data

The following model inputs have known values assigned, respectively.

RA = 300  # rooms available
RP = 120 # room price
OC = 100 # overbooking cost
RL = 310 # reservation limit
demand_μ = 320 # mean for customer demand
demand_σ = 15 # standard deviation for customer demand
cancellation_p = 0.04 # probability of cancellation

Simulated Data

The following model inputs will have values assigned that are simulated by using the most appropriate probability distribution to generate a random value, i.e. Monte Carlo simulation.

Customer Demand (CD)

Generate a random number according to a normal distribution to simulate a demand for the model.

# random number generator
rng = np.random.default_rng()
# generate random number from normal distribution
CD = rng.normal(loc=demand_μ, scale=demand_σ)

Reservations Made (RM)

RM = min(RL, CD)

Cancellations (CN)

Generate a random number according to binomial distribution to simulate the number of customers that cancelled a reservation.

CN = rng.binomial(n=RM, p=cancellation_p)

Customer Arrivals (CA)

CA = RM - CN

Overbooked Customers (OS)

OS = max(0, CA - RA)

Net Revenue (NR)

NR = (min(CA, RA) * RP) - (OS * OC)

The output of printing NR will be different each time to code is executed given that we are generating a random variate for some of the variables in the model.

Currently we can only make one simulation for net revenue. It would be nice to simulate net revenue multiple times to oberve the ensuing distribution of net revenue and infer several statistics. For the sake of this notebook, we will simulate net revenue 1000 times for different reservation limits. To do this we will put all of the code written thus far, except for the import statements, inside of a function that returns all of the simulated net revenue values, given the reservation limit(s).

def net_revenue(n=0, RA=0, RP=0, OC=0, RL=[],loc=0, scale=0, CNP=0):
"""
net_revenue simulates net revenue for
the hotel model described in the notebook,
given a list of reservation limits (RL).
:param n: net revenue simulations to make per reservation limit
:param RA: rooms available
:param RP: room price
:param OC: overbooking cost
:param RL: reservation limits
:param loc: mean for customer demand
:param scale: standard deviation for customer demand
:param CNP: probability of cancellation
:return: pandas.DataFrame containing n simulated net revenue
values for each reservation limit.
"""

# Simulated net revenue (NR) values for
# each reservation limit (RL)
net_revenue_df = pd.DataFrame({})

# Simulated Data
for i in range(len(RL)):
NR_arr = [] # Array of n net revenue simulations

# Simulate n net revenue values for
# the reservation limit RL[i]
for s in range(n):
# random number generator
rng = np.random.default_rng()
# random number from normal distribution
# for customer demand
CD = rng.normal(loc=loc, scale=scale)
# reservations made
RM = min(RL[i], CD)
# random number from binomial distribution
# for cancellations
CN = rng.binomial(n=RM, p=cancellation_p)
# customer arrivals
CA = RM - CN
# overbooked customers
OS = max(0, CA - RA)
# net revenue
NR = round((min(CA, RA) * RP) - (OS * OC), 2)
# Populate NR_arr
NR_arr.append(NR)

# Add the simulated net revenue array for
# the current reservation limit to net_revenue_df
net_revenue_df[RL[i]] = pd.Series(data=NR_arr)

return net_revenue_df

We will conduct 2,500 simulations for each reservation limit.

simulation = net_revenue(
n=2500,
RA=300,
RP=120,
OC=100,
RL=[300, 305, 310, 315, 320, 325, 330],
loc=320,
scale=15,
CNP=0.04
)
print(simulation)
Net revenue simulations by reservations limits.

Keep in mind that your simulation will look different than the screen above consequent to the rand variate we are generating for the uncertain variables.

Simulation Results

We will write code to analyze the net revenue simulations and make statistical inferences. Again, keep in mind that any screenshot of graphs or plots displayed in this blog post will most likely be different than yours consequent to the random variate we are generating for the uncertain variables.

Summary Statistics

stats = simulation.describe()
print(stats)
sns.boxplot(data=simulation)
plt.xlabel("Net Revenue")
plt.ylabel("Reservation Limit")
plt.title("Net Revenue Simulation Box Plot")
plt.show()
A boxplot for the simulated net revenue values by reservation limits.

Confidence Interval

ci_df = pd.DataFrame({})for index, rl in enumerate(simulation.keys()):
ci_df[rl] = st.t.interval(alpha=0.95,
df=simulation.count()[rl]-1,
loc=simulation.mean()[rl],
scale=st.sem(simulation[rl]))
ci_df = ci_df.transpose()
ci_df = ci_df.rename(columns={0: "lower CI", 1: "upper CI"})
ci_df["standard error"] = [st.sem(simulation[rl]) for i, rl in enumerate(simulation.keys())]
ci_df.transpose()
Confidence intervals with standard error for each reservation limit.

Append the ci_df DataFrame with the confidence intervals to the stats DataFrame containing summary statistics.

stats = round(stats.append(ci_df.transpose()), 2)
print(stats)
Summary statistics for the net revenue simulations with the confidence interval information appended.

Percentiles

Now we will find percentiles for the simulated net revenue values in 5% increments.

percentiles = np.arange(0, 1.05, 0.05) 
net_revenue = [round(np.percentile(simulation, q * 100), 2) for q in percentiles]
net_revenue = pd.DataFrame({
"Percentile": percentiles,
"Net Revenue": net_revenue
})
print(net_revenue)
Net revenue percentile table.

By analyzing the percentiles table we can infer that there approximately a 55% probability that the net revenue will be greater than $35,000 per our model. Other such inferences can be made from the percentiles table.

Frequency Distribution

bins = np.arange(20000, 50000, 2500)
hist, bins = np.histogram(simulation, bins=bins)
freq_dist = pd.DataFrame({
"Net Revenue": pd.Series(bins),
"Frequency": pd.Series(hist)
})
print(freq_dist)
Frequency distribution of net revenue.

By analyzing the frequency distribution table we can infer that the majority of the simulated net revenue occur between $25,000 and $35,000. We can plot the frequency distribution table to a histogram to communicate the finding.

hist = sns.barplot(
x=freq_dist["Net Revenue"],
y=freq_dist["Frequency"],
palette="rocket").hist.set_xlim(2, 8)
print(hist)
Histogram for the simulated net revenue values.

Conclusion

I hope you found this post useful. Have a good day!

Best regards,

Kevin Barreiro [themastersdev]

--

--

Kevin Barreiro

Writing about topics related to Information Systems.