Sr. No. | Date | Program List | Pg. No | Sign |
---|---|---|---|---|
Intro to Pandas DataFrame |
How to Start Google Colab
How to Read CSV Datasets with Different Methods
Method 1: Uploading Files Directly
You can upload CSV files from your local machine.
from google.colab import files
import pandas as pd
# Upload the CSV file
upload = files.upload()
# Load the uploaded CSV file into a DataFrame
df = pd.read_csv('purchased_data.csv')
print(df.to_string())
Method 2: Loading Data from a URL
You can load datasets directly from a URL, such as a GitHub raw link.
url = "<https://raw.githubusercontent.com/YBI-Foundation/Dataset/refs/heads/main/Online%20Purchase.csv>"
df2 = pd.read_csv(url)
print(df2.to_string())
Method 3: Using an Alternative Upload Method
You can also read a CSV file by specifying its path.
df = pd.read_csv('/content/purchased_data.csv', header=None)
print(df.to_string())
Association Rule Learning
Step 1: Prepare the Data
To perform association rule learning, you need to convert your DataFrame into a format suitable for the Apriori algorithm.
import numpy as np
# Convert DataFrame to records for association rule learning
records = []
for i in range(len(df)):
row = []
for j in range(len(df.columns)):
value = df.values[i, j]
if pd.notna(value):
row.append(str(value))
records.append(row)
Step 2: Install Required Libraries
If you haven't already, install the apyori
library.
!pip install apyori
Step 3: Apply the Apriori Algorithm
Now you can use the Apriori algorithm to find association rules.
from apyori import apriori
# Perform Apriori with specified support and confidence
association_rule = apriori(records, min_support=0.5, min_confidence=0.75)
association_result = list(association_rule)
print(association_result)
Applying Association Rule Learning to a Different Dataset
You can repeat the process for another dataset (e.g., Fashion.csv) with different support and confidence thresholds.
# Load another dataset
df2 = pd.read_csv('/content/Fashion.csv', header=None)
print(df2.to_string())
# Prepare records for the new dataset
records2 = []
for i in range(len(df2)):
row = []
for j in range(len(df2.columns)):
value = df2.values[i, j]
if pd.notna(value):
row.append(str(value))
records2.append(row)
# Apply the Apriori algorithm to the new dataset
association_rule2 = apriori(records2, min_support=0.4, min_confidence=0.6)
association_result2 = list(association_rule2)
print(association_result2)