Bernoulli Model
The Bernoulli model begins by specifying a time frame over which a claim can occur. (say, one month or one year) Next, it models the occurence of a claim as a variable with two possible outcomes (claim or no claim). As such, it follows a Bernoulli Distribution.
When the specified time frame is sub divided into sub intervals, with each interval modeled as a Bernoulli variable, the resulting analysis is called survival and event analysis
Fitting to Data
In order to fit the Bernoulli variable to data, one must first construct a dataset whose observation period matches that of the modeling variable. That is, if the model is designed to forecast an event over a 1 month time frame, the observations in the dataset must be observations of 1 month periods with a variable indicating the presence or absence of the event.
Once constructed, the probability of the event ocurring can be calculated simply by averaging the dataset variable representing the event. (see fitting a Bernoulli)
Factors Influencing Probability
For the purposes of modeling an insurance contract, it is often necessary to model the claim probability as being influenced by a set of exogenous factors. Sometimes this can be accomplished simply by splitting the observational dataset into separate datasets, each representing a different factor.
For instance, when calculating the probabilty of death as a function of age, one can just split the data into a mortality table. That is, each age bracket is considered separately and a probability calculated for each group separately.
Splitting the dataset can be problematic for the following reasons.
- The dataset may not be large enough to be able to split along every dimension and still be able to get a good estimate
- The factor in question could be continuous. In some cases, such as the age case, it is acceptable to split the continuous variable in a set of buckets, in other cases it may be harder.
When there is a structure to the relationship between the event and the underlying factor (such as a linear relationship), the techniques of regression can be employed to tease out the relationship. In the case of a Bernoulli variable, the Logistic Regression is often used.