R – 海外华人红娘

Everyday R Code (19) – A function example to compare loops and vectorization speed

September 13, 2017 by lilo

#to compare speed of 2 methods: loops and vectorization n=10 #generate 100 numbers between 1 and 1000, and then make a 10 by 10 matrix A=matrix(runif(100,1,1000),nrow=n,ncol=n) B=matrix(runif(100,1,100),nrow=n,ncol=n) #method 1 A%*%B #get system.time to compare with the other method system.time(A%*%B) #method 2 #using a function MultiplyMatrices=function(A,B,n){ R=matrix(data=0,nrow=n,ncol=n) for (i in 1:n) for (j in … Read more

Everyday R Code (18) – matrix calculation

September 7, 2017 by lilo

Calculation1 Subtract different value from multiple columns b <- matrix(rep(1:20), nrow=4, ncol=5) c <- c(1,2,4) b c for(i in 1:nrow(b)) { b[i,3:5] <- b[i,3:5] – c } b Calculation 2 Subtract matrix from matrix from multiple columns b <- matrix(rep(1:20), nrow=4, ncol=5) d<(rep(2:21), nrow=4, ncol=5) b d for(i in 1:nrow(b)) { b[i,3:5] … Read more

Everyday R code (17) Pivot table in R to replace excel

August 30, 2017 by lilo

Pivot table in R to replace excel Step1 #converting data format using data.table library(data.table) live<-data.table(live) Step2 #finding unique count for each bucket using list uniqueData<-live[,list(Unique_user_Count=length(unique(User_ID))),by=list(Market, Company,Group)] Step3 #pivot the table using dcast function in reshape2 package #install the package if you haven’t install.packages(“reshape2″) library(reshape2) pivot<-dcast(uniqueData, Market+Company ~ Group , value.var=”Unique_user_Count”, fun.aggregate=sum) … Read more

Everyday R code (16) survey question selection technique

December 15, 2016December 13, 2016 by lilo

There are many methods that we used to select questions from tons of survey questions. 1. Correlation If 2 questions are highly correlated with each other, 1 question is enough to collect the information we need. 2. Factor Analysis If 2 questions are going towards a similar direction, 1 question is enough to collect … Read more

Everyday R code (15) Text Mining

December 15, 2016December 13, 2016 by lilo

There are several techniques for text mining. Word Cloud https://www.r-bloggers.com/word-cloud-in-r/ Association rule http://www.rdatamining.com/examples/association-rules k-means clustering https://www.r-bloggers.com/clustering-search-keywords-using-k-means-clustering/ LDA Topic modeling A gentle introduction to topic modeling using R

Everyday R code (14) sentiment analysis

December 15, 2016December 13, 2016 by lilo

###########sentiment analysis################ ##use RTextTools package ## this one works well##### #you need 3 files with the following format #positive comment flag like it 0 good job 0 great! 0 #negative comment flag Disappointed that there are … 1 You make a you 1 Pretty difficult 1 ######################################## pos_tweets = read.csv(“positive.csv”,header=T,stringsAsFactors = FALSE) neg_tweets = read.csv(“negative.csv”,header=T,stringsAsFactors … Read more

Everyday R code (13)

November 18, 2016November 11, 2016 by lilo

If the data set regular have 2 columns ‘ID’ and ‘Answer’, but the ‘ID’ is not unique. That means one ID can have 2 or more Answer. We want to put all the Answer for the same ID into one cell, kind of grouping them into one. aggregate is useful for this case. regular … Read more

Everyday R code (12)

November 10, 2016 by lilo

################################### ####Writing data into excel function######### ################################### #Method one # Write the first data set in a new workbook write.xlsx(Data1, file=”exportedata.xlsx”,sheetName=”USA-ARRESTS”, append=FALSE) # Add a second data set in a new worksheet write.xlsx(Data2, file=”exportedata.xlsx”, sheetName=”MTCARS”,append=TRUE) # Add a third data set write.xlsx(Data3, file=”exportedata.xlsx”, sheetName=”TITANIC”,append=TRUE) #Method two # file : the path to the output file … Read more

Everyday R code (11)-Association rule/Market basket

October 16, 2016 by lilo

##Association rule or Market basket library(“arules”) removewords=c(names(termFrequency)[which(termFrequency==1)],’en’,’f’,’nicht’,’es’,’luck’,’giving’,’thought’,’value’,’indeed’,’almost’,’apparently’,’exist’,’d’,’net’,’ture’,’dans’,’des’,’et’,’ne’,’une’,’le’) VerbList=sapply(DATA_.input, function(x){strsplit(x[[1]],’ ‘)}) VerbList=sapply(VerbList, function(x){ Idx=which(x==”” | x %in% removewords) if(length(Idx)>0)x=x[-Idx] else x=x x=unique(x)} ) VerbList=sapply(VerbList,function(x){ paste(x,collapse=’,’)}) temp=which(VerbList==”) VerbList=VerbList[-temp] head(VerbList) write(VerbList,file=’C:\\Users\\folder\\Desktop\\VerbList_a’) verbWordList<- read.transactions(“C:\\Users\\folder\\Desktop\\VerbList_a”, format=”basket”,sep=”,”) rules <- apriori(verbWordList, parameter = list(support = 0.01,confidence = 0.01,minlen=2)) rules.sorted <- sort(rules, by=”support”) inspect(rules.sorted) #inspect(rules.sorted[1:5]) if(length(rules.sorted)>0){ rules.table=list(Keywords=lapply(1:length(rules.sorted), function(i){ wlist=do.call(‘c’,c(LIST(lhs(rules.sorted[i])),LIST(rhs(rules.sorted[i])))) }), quality=quality(rules.sorted)) } #changed … Read more

Everyday R code (9)

February 2, 2019October 16, 2016 by lilo

# word cloud is easy #we need cleanDescription.r file as follows. It was used to write some similar words into one word. For example, write games and gaming into game. You can write lots of them into it. require(tm) cleanDescription <- function(description,additional.stopwords=NULL) { # write to lower case description <- tolower(description) ##remove non-character symbols description … Read more