Everyday R Code (19) – A function example to compare loops and vectorization speed

#to compare speed of 2 methods: loops and vectorization n=10 #generate 100 numbers between 1 and 1000, and then make a 10 by 10 matrix A=matrix(runif(100,1,1000),nrow=n,ncol=n) B=matrix(runif(100,1,100),nrow=n,ncol=n)   #method 1 A%*%B #get system.time to compare with the other method system.time(A%*%B)   #method 2 #using a function MultiplyMatrices=function(A,B,n){ R=matrix(data=0,nrow=n,ncol=n) for (i in 1:n) for (j in … Read more

Everyday R Code (18) – matrix calculation

Calculation1 Subtract different value from multiple columns b <- matrix(rep(1:20), nrow=4, ncol=5) c <- c(1,2,4) b c   for(i in 1:nrow(b)) { b[i,3:5] <- b[i,3:5] – c } b   Calculation 2 Subtract matrix from matrix from multiple columns b <- matrix(rep(1:20), nrow=4, ncol=5) d<(rep(2:21), nrow=4, ncol=5) b d   for(i in 1:nrow(b)) { b[i,3:5] … Read more

Everyday R code (17) Pivot table in R to replace excel

Pivot table in R to replace excel   Step1 #converting data format using data.table library(data.table) live<-data.table(live)   Step2 #finding unique count for each bucket using list uniqueData<-live[,list(Unique_user_Count=length(unique(User_ID))),by=list(Market, Company,Group)]   Step3 #pivot the table using dcast function in reshape2 package #install the package if you haven’t install.packages(“reshape2″) library(reshape2) pivot<-dcast(uniqueData, Market+Company ~ Group , value.var=”Unique_user_Count”, fun.aggregate=sum)   … Read more

Everyday R code (16) survey question selection technique

There are many methods that we used to select questions from tons of survey questions.   1. Correlation If 2 questions are highly correlated with each other, 1 question is enough to collect the information we need.   2. Factor Analysis If 2 questions are going towards a similar direction, 1 question is enough to collect … Read more

Everyday R code (15) Text Mining

There are several techniques for text mining. Word Cloud https://www.r-bloggers.com/word-cloud-in-r/ Association rule http://www.rdatamining.com/examples/association-rules k-means clustering https://www.r-bloggers.com/clustering-search-keywords-using-k-means-clustering/ LDA Topic modeling A gentle introduction to topic modeling using R

Everyday R code (14) sentiment analysis

###########sentiment analysis################ ##use RTextTools package ## this one works well##### #you need 3 files with the following format #positive comment flag like it 0 good job 0 great! 0 #negative comment flag Disappointed that there are … 1 You make a you 1 Pretty difficult 1 ######################################## pos_tweets = read.csv(“positive.csv”,header=T,stringsAsFactors = FALSE) neg_tweets = read.csv(“negative.csv”,header=T,stringsAsFactors … Read more

Everyday R code (13)

If the data set regular have 2 columns ‘ID’ and ‘Answer’, but the ‘ID’ is not unique. That means one ID can have 2 or more Answer. We want to put all the Answer for the same ID into one cell, kind of grouping them into one. aggregate is useful for this case.   regular … Read more

Everyday R code (12)

################################### ####Writing data into excel function######### ################################### #Method one # Write the first data set in a new workbook write.xlsx(Data1, file=”exportedata.xlsx”,sheetName=”USA-ARRESTS”, append=FALSE) # Add a second data set in a new worksheet write.xlsx(Data2, file=”exportedata.xlsx”, sheetName=”MTCARS”,append=TRUE) # Add a third data set write.xlsx(Data3, file=”exportedata.xlsx”, sheetName=”TITANIC”,append=TRUE)   #Method two # file : the path to the output file … Read more

Everyday R code (11)-Association rule/Market basket

##Association rule or Market basket library(“arules”) removewords=c(names(termFrequency)[which(termFrequency==1)],’en’,’f’,’nicht’,’es’,’luck’,’giving’,’thought’,’value’,’indeed’,’almost’,’apparently’,’exist’,’d’,’net’,’ture’,’dans’,’des’,’et’,’ne’,’une’,’le’) VerbList=sapply(DATA_.input, function(x){strsplit(x[[1]],’ ‘)}) VerbList=sapply(VerbList, function(x){ Idx=which(x==”” | x %in% removewords) if(length(Idx)>0)x=x[-Idx] else x=x x=unique(x)} ) VerbList=sapply(VerbList,function(x){ paste(x,collapse=’,’)}) temp=which(VerbList==”) VerbList=VerbList[-temp] head(VerbList) write(VerbList,file=’C:\\Users\\folder\\Desktop\\VerbList_a’) verbWordList<- read.transactions(“C:\\Users\\folder\\Desktop\\VerbList_a”, format=”basket”,sep=”,”) rules <- apriori(verbWordList, parameter = list(support = 0.01,confidence = 0.01,minlen=2)) rules.sorted <- sort(rules, by=”support”) inspect(rules.sorted) #inspect(rules.sorted[1:5]) if(length(rules.sorted)>0){ rules.table=list(Keywords=lapply(1:length(rules.sorted), function(i){ wlist=do.call(‘c’,c(LIST(lhs(rules.sorted[i])),LIST(rhs(rules.sorted[i])))) }), quality=quality(rules.sorted)) } #changed … Read more

Everyday R code (9)

# word cloud is easy #we need cleanDescription.r file as follows. It was used to write some similar words into one word. For example, write games and gaming into game. You can write lots of them into it. require(tm) cleanDescription <- function(description,additional.stopwords=NULL) { # write to lower case description <- tolower(description) ##remove non-character symbols description … Read more