Sequence of shopping carts analysis with R(0) – Sankey diagram
来源:互联网 发布:淘宝机构招聘信息 编辑:程序博客网 时间:2024/06/05 20:57
We studied how we can visualize the structure of a shopping cart in theprevious post. Although you can find a great deal of materials on how to analyze combinations of products in the shopping cart (e.g. via association rules), there is a lack of sources on how to analyze the sequences of shopping carts. This post is an attempt to make up for this lack of sources.
The sequence analysis of the shopping carts can bring you useful knowledge of patterns of customer’s behavior. You can discover dependences between product sets. For example, client bought product A and B in the first cart and product A in both the second and third cart. Probably, he wasn’t satisfied with product B (its price, quality, etc.) or you can discover that after “A, B, C” carts clients purchased product D very often. It can give you the opportunity to recommend this product to clients who didn’t purchase D after an “A, B, C” cart.
As I’m a big fan of visualization I will recommend an interesting chart for this analysis: Sankey diagram. So, let’s start!
After we load the necessary libraries with the following code,
# loading libraries
library
(googleVis)
library
(dplyr)
library
(reshape2)
we will simulate an example of the data set. Suppose we sell 3 products (or product categories), A, B and C, and each product can be sold with a different probability. Also, a client can purchase any combinations of products. Let’s do this with the following code:
# creating an example of orders
set.seed
(15)
df <-
data.frame
(orderId=
c
(1:1000),
clientId=
sample
(
c
(1:300), 1000, replace=
TRUE
),
prod1=
sample
(
c
(
'NULL'
,
'a'
), 1000, replace=
TRUE
, prob=
c
(0.15, 0.5)),
prod2=
sample
(
c
(
'NULL'
,
'b'
), 1000, replace=
TRUE
, prob=
c
(0.15, 0.3)),
prod3=
sample
(
c
(
'NULL'
,
'c'
), 1000, replace=
TRUE
, prob=
c
(0.15, 0.2)))
# combining products
df$cart <-
paste
(df$prod1, df$prod2, df$prod3, sep=
';'
)
df$cart <-
gsub
(
'NULL;|;NULL'
,
''
, df$cart)
df <- df[df$cart!=
'NULL'
, ]
df <- df %>%
select
(orderId, clientId, cart) %>%
arrange
(clientId, orderId, cart)
We generated 1000 orders from 300 clients and our data set looks like this:
head
(df)
## orderId clientId cart ## 1 451 1 a;b;c ## 2 217 2 a;b ## 3 261 2 a;b ## 4 577 2 a;b ## 5 902 2 c ## 6 199 3 a;b;c
After this, we need to arrange orders from each client with the following code. Note: we assume that the order/cart serial numbers were assigned based on the purchase date. In other cases, you can use purchase date for identifying the sequence.
orders <- df %>%
group_by
(clientId) %>%
mutate
(n.ord =
paste
(
'ord'
,
c
(1:
n
()), sep=
''
))
The head of the data frame we obtain is:
head
(orders)
## orderId clientId cart n.ord ## 1 451 1 a;b;c ord1 ## 2 217 2 a;b ord1 ## 3 261 2 a;b ord2 ## 4 577 2 a;b ord3 ## 5 902 2 c ord4 ## 6 199 3 a;b;c ord1
The next step is to create a matrix with sequences with the following code:
orders <-
dcast
(orders, clientId ~ n.ord, value.var=
'cart'
, fun.aggregate =
NULL
)
The head of the data frame we obtain is:
## clientId ord1 ord10 ord11 ord2 ord3 ord4 ord5 ord6 ord7 ord8 ord9 ## 1 1 a;b;c <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> ## 2 2 a;b <NA> <NA> a;b a;b c <NA> <NA> <NA> <NA> <NA> ## 3 3 a;b;c <NA> <NA> a;b a <NA> <NA> <NA> <NA> <NA> <NA> ## 4 4 a;c <NA> <NA> a a;c b;c a;b <NA> <NA> <NA> <NA> ## 5 5 a;b;c <NA> <NA> a;c a;b;c a <NA> <NA> <NA> <NA> <NA> ## 6 6 a <NA> <NA> b;c b <NA> <NA> <NA> <NA> <NA> <NA>
Therefore, we just need to choose a number of carts/orders in the sequence we want to analyze. I will choose 5 carts with the following code:
orders <- orders %>%
select
(ord1, ord2, ord3, ord4, ord5)
Also, if you have a lot of product combinations instead of 7 as in my example, you can limit them with thefilter() function (e.g. filter(ord1==’a;b;c’)) for clarity.
And finally we will create a data set for plotting with the following code:
orders.plot <-
data.frame
()
for
(i
in
2:
ncol
(orders)) {
ord.cache <- orders %>%
group_by
(orders[ , i-1], orders[ , i]) %>%
summarise
(n=
n
())
colnames
(ord.cache)[1:2] <-
c
(
'from'
,
'to'
)
# adding tags to carts
ord.cache$from <-
paste
(ord.cache$from,
'('
, i-1,
')'
, sep=
''
)
ord.cache$to <-
paste
(ord.cache$to,
'('
, i,
')'
, sep=
''
)
orders.plot <-
rbind
(orders.plot, ord.cache)
}
Note: I added tags to combinations with their number in the sequence because it is impossible to create a sankey diagram from A product to A product for example. So, I transformed the sequence A –> A to A(1) –> A(2).
Finally, we will get a great type of visualization with the following code:
plot
(
gvisSankey
(orders.plot, from=
'from'
, to=
'to'
, weight=
'n'
,
options=
list
(height=900, width=1800, sankey=
"{link:{color:{fill:'lightblue'}}}"
)))
The bandwidths correspond to the weight of sequence. You can highlight any cart/order and path of sequence as well. The size of plot can be changed via changingheight and width parameters. Note: the NAs in our chart mean that the sequence ended. Feel free to share your ideas and comments!
- Sequence of shopping carts analysis with R(0) – Sankey diagram
- Sequence of shopping carts in-depth analysis with R(1)
- Sequence of shopping carts in-depth analysis with R(3)– Sequence of events
- Sequence of shopping carts in-depth analysis with R(2) – Clustering
- Shopping cart analysis with R(-1) – Multi-layer pie chart
- R-sankey
- Modeling with Sequence Diagram
- Time series Analysis with R
- Market Basket Analysis with R
- Time series Analysis with R(二)
- analysis of ShadowMapping Sample with GLSL
- Main Steps of Antenna Analysis with HFSS
- Cause and Effect Diagram- Fishbone Diagram Analysis
- WebIM Sequence Diagram
- Telnet Sequence Diagram
- System Sequence Diagram --- SSD
- UML 2----Sequence Diagram
- SurfaceFlinger sequence diagram
- Sequence of shopping carts in-depth analysis with R(3)– Sequence of events
- Android酷炫UI合集
- SYSTICK用法
- Android 打造炫目的圆形菜单 秒秒钟高仿建行圆形菜单
- .NET开发者必备的工具箱
- Sequence of shopping carts analysis with R(0) – Sankey diagram
- ExtJS中layout布局详解
- 【iOS开发-116】内存测试管理和优化:Analyze和Profile,以及iOS开发常用工具simPholders、fabric
- JAVA自学-事件处理
- Shopping cart analysis with R(-1) – Multi-layer pie chart
- ECSHOP插件大全 ecshop模板2014集合包
- Android技术——数据库(一):基本概念
- ARP欺骗:先认识再防御
- 一些常用的UTIL工具