Bernoulli Model
The Bernoulli model begins by specifying a time frame over which a claim can occur. (say, one month or one year) Next, it models the occurence of a claim as a variable with two possible outcomes (claim or no claim). As such, it follows a Bernoulli Distribution.When the specified time frame is sub divided into sub intervals, with each interval modeled as a Bernoulli variable, the resulting analysis is called survival and event analysis
Fitting to Data
In order to fit the Bernoulli variable to data, one must first construct a dataset whose observation period matches that of the modeling variable. That is, if the model is designed to forecast an event over a 1 month time frame, the observations in the dataset must be observations of 1 month periods with a variable indicating the presence or absence of the event.Once constructed, the probability of the event occurring can be calculated simply by averaging the dataset variable representing the event. (see fitting a Bernoulli)
Factors Influencing Probability
For the purposes of modeling an insurance contract, it is often necessary to model the claim probability as being influenced by a set of exogenous factors. Sometimes this can be accomplished simply by splitting the observational dataset into separate datasets, each representing a different factor.For instance, when calculating the probability of death as a function of age, one can just split the data into a mortality table. That is, each age bracket is considered separately and a probability calculated for each group separately.
Splitting the dataset can be problematic for the following reasons.
- The dataset may not be large enough to be able to split along every dimension and still be able to get a good estimate
- The factor in question could be continuous. In some cases, such as the age case, it is acceptable to split the continuous variable in a set of buckets, in other cases it may be harder.
When there is a structure to the relationship between the event and the underlying factor (such as a linear relationship), the techniques of regression can be employed to tease out the relationship. In the case of a Bernoulli variable, the Logistic Regression is often used.