Analytic Predictive With Online Retail
In conducting Descriptive Analysis of Retail Online using a dataset from Kaggle, the dataset can be downloaded at this https://www.kaggle.com/sanjeet41/online-retail. The analysis will be carried out in detail for companies that have online retail data to advance the company by carrying out policies based on the recommendations of the analysts conducted.
In the process of doing a Predictive analysis, i was used R Language with IDE R Studio.Before Before conducting a predictive analysis, the first step that needs to be done is to import the required packages including:
Ggplot2 , library (ggplot2)Library is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
Readr , library (readr) is a package used for reading dataset, both in csv, tsv, fwf formats. This package is created flexibly to parse data in the dataset.
Magrittr, library(magrittr), The magrittr (to be pronounced with a sophisticated french accent) is a package with two aims: to decrease development time and to improve readability and maintainability of code. Or even shortr: to make your code smokin’ (puff puff).
Dplyr, library(dplyr) is a package in R that can be used to manipulate data. This package was developed by Hadley Wickham and Roman Francois which provides several functions that are easy to use. This package is very useful when used for data analysis and exploration.
Reshape2, library(reshape2) : Flexibly Reshape Data: A Reboot of the Reshape Package Reshape2 library used for Flexibly restructure and aggregate data using just two functions: melt and ‘dcast’ (or ‘acast’).
Lubridate, library(lubridate): Make Dealing with Dates a Little Easier and Functions to work with date-times and time-spans: fast and user friendly parsing of date-time data, extraction and updating of components of a date-time (years, months, days, hours, minutes, and seconds), algebraic manipulation on date-time and time-span objects.
- Install Packages and Import Data
Before analyzing the steps below, install and run the packages in the steps above where the ggplot2 package functions for data visualization, the ‘readr’ package for importing data, the ‘dplyr’ package for data manipulation. Next enter the dataset named online_retail using the data file storage directory.
2. Look for dimensions and data variable
Based on the picture above it is known that the online retail data named by the customer has 240007 rows and 8 columns as variables. These variables include InvoiceNo as customer purchase code with factor data type, StockCode as item code with Factor data type, Description as item description or item name with factor data type, Quantity as amount of goods purchased by customer with integer data type, InvoiceDate as the time or date of purchase with a data factor type, UnitPrice as the price of the product for each unit with a data number type. CustomerID is the unique code of the customer or you can say the buyer id is unique. Country is the country of origin of the customer with the type of data factor.
3. Data Cleansing
Langkah selanjutnya yaitu melakukan pengecekan NULL pada dataset costumer, berdasarkan fungsi sapply, dihasilkan variabel CosstumerID memiliki data NULL sebanyak 67225, maka langah selanjutnya yaitu menghapus data NULL tadi menggunakan library dplyr.
3. Frequency of Online Retail Transactions Based on Customer Country
<script src=”https://gist.github.com/trijuhari/6cdb99d3c7963c9155c7db1e44f546f0.js"></script>
From the picture above, it is known that customers are spread from 38 countries, where the most countries that do online retail or online shopping come from the United Kingdom with a total of 220 279 times. And for customers who do online retail at least come from the Unit country with the number of times.
4. Plotting Frequency of Online Retail Transactions Based on Customer Country
Based on the above dataset, customers are spread from 38 countries where the country with the most customers comes from the UK totaling 220279 and the least one from the UAE is 1.
5. Frequency of Sales Day by Day
6. Plotting Frequency of Sales Day by Day
Based on the frequency of sales per day, in one week, the number of sales per day is known in 1 week. The highest sales occurred on Thursday with sales of 805537 products, then Tuesday with 732736 products, Wednesday with 690984 products, Friday with 555412 products, Monday with 518657 products, and the lowest sales occurred on Sunday with 322900 products sold.
7. Customers Loyal
8. Product Bring Most Revenue
The picture above describes the 10 products that are most in demand by customers. The first sequence the most popular item is product code 22423, of which 101062 products have sold products and the DOT code as many as 87936 products, and so on where the more the order code product then there are fewer buyers of that product.
The image above ranks the 10 highest rankings by the number of products sold in a particular country. The highest sales were in the United Kingdom with total sales of 3572911 products.
Then the evidence with Netherlands as many as 125721 products, Germany as many as 103526 products, EIRE as many as 99384 products, France as many as 87443 products, Australia as many as 79071 products, Spain as many as 24723 products, Switzerland as many as 22654 products, Japan as many as 21133 products, and the order number 610 is Belgium as much 17251 products, and so on, where the order is below the country so fewer products are sold in that country.
9. Hourly Sales Frequency
High sales volume also requires additional staff to keep the sales process stable. Based on the plot above is the number of products sold in a certain hour, so that the company owner can consider at which time he will need additional staff.
The picture above shows that sales start in the morning with the number of products sold continuing to increase until it reaches a peak point where the highest sales will occur at 12:00 at 361320 sales and decrease significantly until the evening.
Sales with the top 5 ranked at 12:00, and decreased at 13:00 with a total of 356564 sales, at 14:00 as many as 309820 sales, at 15:00 as many as 273300 sales, at 11:00 as many as 258016, at 10: 00 as much as 175610 sales and so on will decline.
10. Monthly Sales Frequency
The following is a plot of product sales per month, so that the company owner can find out the highest sales peak which can be considered as a policy in attracting consumer attention.
Based on the existing plot, it is known that the highest sales occurred in the 5th month or May with 648251 products and the lowest sales occurred in April with a total of 426048 products.
11. TOP 10 Customer
It is known that customers with ID 14646 are the most customers who do online retail with a total of 121929 products purchased. Then in second place occupied by customer 9 with ID 18102 with a total purchase of 106443 products, the third order is customer with ID 12415 with the purchase of 73717 products, and so on, where the lower the customer ID, the fewer products are purchased.
12. Conclusion
After conducting descriptive analysis, several policies can be recommended that can be used to promote the company, including:
1. Free Shipping or postage discount for products purchased for the country based
analysis of the frequency of transactions on countries with the highest customers.
2. Increase the number of staff on Thursday based on the analysis of the highest frequency of sales per day.
3. Give points for every purchase to customers that can be exchanged for certain products
based on the analysis of the frequency of product purchases on customer ID.
4. Give a discount or discount in May based on the analysis of the highest sales per month.
5. Run flash sale at 12:00 based on the analysis of the highest sales of the hour.
6. Hold quizzes with shopping vouchers prizes on Thursday based on the highest frequency analysis
on sales per day.
7. Give 10% cashback on product purchases with code 22423 based on an analysis of the most frequently sold products.