Learn about the relationships in your transaction data

Cloud Account Sign in to Cloud Sign Up for Free Cloud Tier

Oracle Account

Learn from your data: Perform association analysis with Python and pandas

Identify relationships between items in your transaction data—and make business decisions based on those relationships.

By Yuli Vasiliev | November 2021

Learn from your data: Perform association analysis with Python and pandas

Your organization has tons of transaction data—and creates more data every day. There’s invaluable knowledge in those database records! Association analysis lets your organization benefit from this data, extracting useful information from it. This article explains how to take advantage of this technique to identify relationships between items in transaction data and turn them into actionable insights.

After a brief discussion on the basics of association analysis, you’ll see some examples of applying association analysis techniques to a sample set of transaction data in Python and in Oracle Database for comparison. This article assumes some knowledge of Python and Oracle Database, such as being able to install libraries with pip.

Introduction to association analysis

Association analysis discovers the probability of the co-occurrence of items in a collection. The strength of associations between co-occurring items is measured by metrics known as association rules. In essence, association rules are IF-THEN statements that express the probability of relationships between items. An association rule is denoted as X -> Y, where X is the IF component of the rule, called the antecedent, and Y is the THEN component, called the consequent. Or, to put it more plainly, association analysis tells you that if X occurs in a record in the dataset, how likely it is that X would show up in the same record.

In IF-THEN notation, the above rule might be denoted as IF X THEN Y.

Association rules are often used to analyze sales transaction data, helping retailers find relationships between the products that people buy together frequently. For example, the rule IF {bread, butter} THEN {cheese}, which might be found in the transaction data of a grocery store, indicates whether, when shoppers buy bread and butter in the same shopping trip, they are likely to also pick up cheese. This knowledge might drive coupons and special pricing, product placement, and inventory management. When the insights are not intuitively obvious, they could present not only a way to increase revenue but also a competitive advantage.

There are several metrics for evaluating the compliance of a rule. The most common association metrics are the following:

Support is the ratio of transactions that include a certain item (or an itemset) to the total number of transactions.
Confidence is the ratio of transactions in which the antecedent and the consequent occur together to the number of transactions in which the antecedent occurs.
Lift is the ratio of the joint occurrence of the antecedent and the consequent to the product of the probabilities of occurrences of the antecedent and the consequent if they were independent.

In the first place, you can use the support metric to determine frequent itemsets in your transaction set (a set of items that occurs in many transactions is considered frequent). The other metrics can then evaluate association rules for the frequent itemsets. The Apriori algorithm, developed by Rakesh Agrawal and Ramakrishnan Srikant in the 1990s, is designed just for that: determining frequent itemsets and evaluating association rules on them.

Uncovering association patterns with the Apriori algorithm

Using the Apriori algorithm consists of performing the following two steps:

Determine frequent itemsets in a transaction dataset.
Generate association rules for those frequent itemsets.

Below is a simple example that illustrates how this algorithm can be applied in practice. The example uses the sample transaction dataset implemented as a Python list of lists, where each nested list represents a set of products found in a single transaction.

Learn from your data: Perform association analysis with Python and pandas

Identify relationships between items in your transaction data—and make business decisions based on those relationships.

Introduction to association analysis

Uncovering association patterns with the Apriori algorithm

Processing a transaction dataset with Apriori

Using the data to make recommendations

Performing a similar analysis with Oracle Database

Conclusion