Data mining is frequently described as "the process of extracting valid, authentic, and actionable information from large databases." In other words, data mining derives patterns and trends that exist in data. These patterns and trends can be collected together and defined as a mining model. Mining models can be applied to specific business scenarios, such as:
- Forecasting sales.
- Targeting mailings toward specific customers.
- Determining which products are likely to be sold together.
- Finding sequences in the order that customers add products to a shopping cart.
An important concept is that building a mining model is part of a larger process that includes everything from defining the basic problem that the model will solve, to deploying the model into a working environment. This process can be defined by using the following six basic steps:
Defining the Problem
Exploring and Validating Models
Deploying and Updating Models
Although the process that is illustrated in the diagram is circular, each step does not necessarily lead directly to the next step. Creating a data mining model is a dynamic and iterative process. After you explore the data, you may find that the data is insufficient to create the appropriate mining models, and that you therefore have to look for more data. You may build several models and realize that they do not answer the problem posed when you defined the problem, and that you therefore must redefine the problem. You may have to update the models after they have been deployed because more data has become available. It is therefore important to understand that creating a data mining model is a process, and that each step in the process may be repeated as many times as needed to create a good model.