Найти в Дзене
Start With Why

Banks, Retail, Medicine: Who Uses Data Mining and for What

The term Data Mining is being used more and more frequently, but sometimes it’s confused with Big Data. RBC Trends explains how data mining works, why it’s a whole science, and how much data miners earn. Data Mining (data extraction, intelligent data analysis, in-depth data analysis, or simply data mining) is a process used by companies to turn raw big data into useful information. A less popular term for this technology is “knowledge discovery in databases” or KDD. While the term Big Data refers to all large data — both processed and unprocessed, Data Mining represents the process of deep immersion in this data to extract key knowledge. The author of the term Data Mining, Gregory Piatetsky-Shapiro, defined it as the process of discovering previously unknown, non-trivial, practically useful, and interpretable knowledge in raw data, necessary for decision-making in various spheres of human activity. Using software to find patterns in large data packages, businesses can build marketing s
Оглавление

The term Data Mining is being used more and more frequently, but sometimes it’s confused with Big Data. RBC Trends explains how data mining works, why it’s a whole science, and how much data miners earn.

What is Data Mining

Data Mining (data extraction, intelligent data analysis, in-depth data analysis, or simply data mining) is a process used by companies to turn raw big data into useful information. A less popular term for this technology is “knowledge discovery in databases” or KDD.

While the term Big Data refers to all large data — both processed and unprocessed, Data Mining represents the process of deep immersion in this data to extract key knowledge.

The author of the term Data Mining, Gregory Piatetsky-Shapiro, defined it as the process of discovering previously unknown, non-trivial, practically useful, and interpretable knowledge in raw data, necessary for decision-making in various spheres of human activity.

Using software to find patterns in large data packages, businesses can build marketing strategies, manage credit risks, detect fraud, filter spam, or even identify user sentiments.

Intelligent data analysis depends on effective data collection, storage, and computer processing. Data Mining is considered a separate discipline in the field of data science.

The term “data mining” appeared in academic journals as early as 1970, but it only became truly popular in the 1990s after the advent of the internet. Then companies needed to analyze large volumes of heterogeneous data to find non-trivial patterns and learn to predict customer behavior. Conventional statistical models proved incapable of handling this task.

The first Data Mining systems were designed to process supermarket sales data on several parameters, including their volume by region and product type.

Data Mining Tasks

Data mining models are applied to several types of tasks:

- Forecasting: sales assessment, server load prediction, or its downtime;
- Risk and probability: selecting suitable customers for targeted mailing, determining the balance point for risky scenarios, assigning probabilities for diagnoses or other outcomes;
- Recommendations: determining products that will sell together, creating recommendation messages;
- Sequence search: analyzing customer choices during purchases, predicting their behavior;
- Grouping: dividing customers or events into clusters, analyzing and predicting common features of these clusters.

Where Data Mining is Applied

Data mining is mainly used by industries serving consumers, including retail, finance, and marketing. For example, Sber has a service called “Sbor Analytics”, which provides data on market sectors or territories based on the analysis of population cash flows, sales of goods and services, and other parameters. It can be used by both companies and government agencies to assess the development potential of a region.

Trade

For retail chains, Data Mining allows analyzing shopping baskets to improve advertising, create product stocks in warehouses and plan how to display them on shelves, open new stores, and identify the needs of different customer categories.

The Russian chain “Lenta” analyzed loyalty card data of more than 90% of its customers and divided the audience into certain segments based on purchasing behavior. In particular, the retailer identified a segment buying only basic products and men who more often bought only drinks and snacks. This allowed optimizing the assortment and managing display and prices. And Amazon in October 2021 announced a tool that will provide sellers access to information about what buyers are currently looking for, thereby helping to simplify the choice of products for sale.

Banks and Telecom

For credit organizations, Data Mining allows detecting credit card fraud by analyzing similar transactions, as well as offering different types of services to different groups of clients. Telecom uses data analysis to combat spam and develop new tariffs for various groups of subscribers.

Russian mobile operators use Data Mining for internal purposes, and also offer data analysis as a product. For example, in 2020, “Beeline” launched a new service that allows companies to obtain demographic data of their clients through data mining on the databases collected by VimpelCom.

Insurance

Insurance companies analyze large volumes of data to identify risks and reduce their losses on liabilities, as well as offer relevant services to clients.

For example, the Australian private insurance company HCF’s analysis of big data allowed it to reduce advertising mailing costs by 25% over four months. Analysts accurately identified those clients who were most likely ready to purchase a more expensive service and made a separate mailing for them.

Manufacturing

For enterprises, big data analysis allows coordinating supply plans with demand forecasts, as well as detecting production problems at early stages and successfully investing in the brand. In addition, manufacturers can predict the wear of production assets and plan maintenance and repairs to avoid stopping the production line. An example of Data Mining application in industry is predicting product quality depending on the parameters of the technological process.

The Russian company “Infosystems Jet” offers an intelligent decision support system Jet Galatea. It analyzes technological instructions and data coming from sensors on equipment, and then forms and issues recommendations to technologists on optimal production process management. Jet Galatea is used in metallurgy, woodworking, agribusiness, and mining to reduce raw material consumption and increase product volume.

Sociology

Sentiment analysis based on social media data allows understanding how a certain group of people relates to a specific topic. Since 2016, Russian police have been using the “Zeus” system in some regions of the country. It allows tracking user behavior on social networks and builds a graph of the environment, establishing a possible connection between users based on the analysis of friends, relatives, indirect friends, places of residence, common groups, likes, and reposts.

Medicine

Data Mining systems are also used for medical diagnoses. They are built on the basis of rules describing combinations of symptoms of various diseases. The rules help in choosing treatment methods. For example, the British startup Babylon Heath collects all information about clients’ health, lifestyle, and habits, and then the algorithm builds hypotheses and suggests options for examination, treatment, and even recommends specific doctors and clinics.

Recommendation Systems

Such systems are designed to offer goods or services that are likely to be of interest to people, and are also used for customer support. They work thanks to data mining, which is carried out in real-time. In simple terms, the model is constantly updated. This is how voice assistants like Alexa from Amazon, Siri from Apple, and “Alice” from Yandex work. As an example, we can also mention the DiDi taxi support service, where the algorithm solves up to 60% of user requests, as they are often similar.

Data Mining Technology and Methods

There are several stages of data mining:

1. Problem statement. This step includes analysis of business requirements, defining the problem area, metrics by which the model will be evaluated, and defining tasks for the analysis project.
2. Data preparation: consolidation and cleaning. This work includes not only removing unnecessary data but also searching for hidden dependencies in them, determining the sources of the most accurate data, and creating a table for analysis.
3. Data exploration.
4. Building models.
5. Exploring and verifying models. The accuracy of their predictions can be checked using special tools.
6. Deploying and updating models. When the model is working, it needs to be updated as new data comes in, and then reprocessed.

What a Data Miner Should Know and Be Able to Do

A specialist in intelligent data processing should have deep knowledge in the field of mathematical statistics, be proficient in foreign languages, as well as programming languages. They process large volumes of information and search for connections in it. The specialist uses machine learning techniques, creates algorithms, works with statistical analysis. Then the data miner presents the results of their work to the organization in an understandable format. Based on these presentations, the company makes decisions.

Employers prefer Data Mining specialists with technical, mathematical, or natural science education. Universities offer corresponding fields of study: “Mathematics and Computer Science”, “Applied Mathematics and Informatics”, “Applied Informatics”, and “System Analysis and Management”. In addition, the basics of Data Mining can be studied in courses, for example, on Coursera.

According to the HeadHunter portal, in October 2021, data miners’ salaries in Russia ranged from ₽28,000 to ₽250,000.

Programs for Data Mining

There are many programs that can perform Data Mining tasks. Here are some examples:

- SAS Enterprise Miner
- Microsoft Analysis Services
- SAS Customer Intelligence 360
- SAS Credit Scoring
- Board
- SAS Revenue Optimization
- RapidMiner

The Future of Data Mining

The Data Mining systems market is growing. This is facilitated by the activities of large corporations: SAS, IBM, Microsoft, Oracle, and others. It is expected that by 2027 the volume of the global advanced analytics market will grow by 23.1% and reach $56.2 billion.

Recent trends in Data Mining include the development of analysis methods with elements of virtual and augmented reality, their integration with database systems, mining of biological data for innovations in medicine, web mining (analysis of data on the internet), real-time data analysis, as well as measures to protect privacy in data mining. Industry leaders believe that in the future, data mining will be used in intelligent applications that will be embedded in corporate data warehouses.

The main problem in discovering patterns in data is the time required to search through information arrays. Known methods either artificially limit such a search or build entire decision trees that reduce search efficiency. Solving this problem remains the main goal of developers of Data Mining products.