Allen's 데이터 맛집

[머신러닝] 분류 : 고객 구매 데이터로 물품 제시간 도착여부 예측하기. 본문

Project/Kaggle 분석&기계학습

[머신러닝] 분류 : 고객 구매 데이터로 물품 제시간 도착여부 예측하기.

Allen93 2023. 9. 10. 22:40

이번 포스팅에선 Kaggle의 'E-Commerce Shipping Data'고객 구매 데이터를 사용해서 고객이 주문한 물품이 제시간에 도착하였는지 여부(Reached.on.Time_Y.N)를 예측해 보겠습니다.

 

 

About Dataset

Context

An international e-commerce company based wants to discover key insights from their customer database. They want to use some of the most advanced machine learning techniques to study their customers. The company sells electronic products.

Content

The dataset used for model building contained 10999 observations of 12 variables.
The data contains the following information:

  • ID: ID Number of Customers.
  • Warehouse block: The Company have big Warehouse which is divided in to block such as A,B,C,D,E.
  • Mode of shipment:The Company Ships the products in multiple way such as Ship, Flight and Road.
  • Customer care calls: The number of calls made from enquiry for enquiry of the shipment.
  • Customer rating: The company has rated from every customer. 1 is the lowest (Worst), 5 is the highest (Best).
  • Cost of the product: Cost of the Product in US Dollars.
  • Prior purchases: The Number of Prior Purchase.
  • Product importance: The company has categorized the product in the various parameter such as low, medium, high.
  • Gender: Male and Female.
  • Discount offered: Discount offered on that specific product.
  • Weight in gms: It is the weight in grams.
  • Reached on time: It is the target variable, where 1 Indicates that the product has NOT reached on time and 0 indicates it has reached on time.

Acknowledgements

I would like to specify that I am only making available on Github in Data collected data about product shipment to Kagglers. I made this as my project on Customer Analytics stored in GitHub repository.

Inspiration

This data of Product Shipment Tracking, answer instantly to your questions:

  • What was Customer Rating? And was the product delivered on time?
  • Is Customer query is being answered?
  • If Product importance is high. having higest rating or being delivered on time?

데이터 출처 : https://www.kaggle.com/datasets/prachi13/customer-analytics

 

 


 

데이터 불러오기, 탐색

xtrain, xtest, ytrain, ytest 데이터들을 불러옵니다.

 

데이터 전처리

 

변수별 비율 확인 후 Object 타입에 대한 Lable Encoding을 진행합니다.

 

모델 생성 및 학습

데이터 분할을 위한 train_test_split 모듈을 사용합니다.

 

LogisticRegression() 모델을 사용하여 예측값을 계산합니다.

 

728x90