Everyday R code (12)

################################### ####Writing data into excel function######### ################################### #Method one # Write the first data set in a new workbook write.xlsx(Data1, file=”exportedata.xlsx”,sheetName=”USA-ARRESTS”, append=FALSE) # Add a second data set in a new worksheet write.xlsx(Data2, file=”exportedata.xlsx”, sheetName=”MTCARS”,append=TRUE) # Add a third data set write.xlsx(Data3, file=”exportedata.xlsx”, sheetName=”TITANIC”,append=TRUE)   #Method two # file : the path to the output file … Read more

Everyday SQL (4)

ALTER SESSION ENABLE PARALLEL QUERY; ALTER SESSION SET NLS_DATE_FORMAT=’YYYY-MM-DD HH24:MI:SS’; select table1.contact_id,table1.contact_date,table1.people_login ,count(contact_id) over (partition by people_login order by table1.creation_datetime asc range numtodsinterval(365, ‘day’) preceding) as total_contacts ,sum(case when contact_type_code=’p’ then 1 else 0 end) over (partition by people_login order by table1.creation_datetime asc range numtodsinterval(365, ‘day’) preceding) as phone_contacts From contacts table1 where table1.marketplace=232435465 This … Read more

Everyday SQL (3)

–Common Table Expressions (not called Temporary table, many thanks to YZ to correct me, refer to http://www.sql-server-performance.com/2015/common-table-expressions-cte-developers/) with T as ( select Product_ID,max(case when MEDIA_LEG_RESULT=’T’ then 1 else 0 end) as T_leg from detail  data1 where NVL(disconnect_reason,’1′) NOT IN (‘SPAM’,’DUPLICATE’) and connect_date_datetime BETWEEN to_date(‘2016-07-01 00:00:00’, ‘YYYY-MM-DD HH24:MI:SS’) and to_date(‘2016-08-01 00:00:00’, ‘YYYY-MM-DD HH24:MI:SS’) and data1.Location_id=1 group … Read more

Everyday R code (11)-Association rule/Market basket

##Association rule or Market basket library(“arules”) removewords=c(names(termFrequency)[which(termFrequency==1)],’en’,’f’,’nicht’,’es’,’luck’,’giving’,’thought’,’value’,’indeed’,’almost’,’apparently’,’exist’,’d’,’net’,’ture’,’dans’,’des’,’et’,’ne’,’une’,’le’) VerbList=sapply(DATA_.input, function(x){strsplit(x[[1]],’ ‘)}) VerbList=sapply(VerbList, function(x){ Idx=which(x==”” | x %in% removewords) if(length(Idx)>0)x=x[-Idx] else x=x x=unique(x)} ) VerbList=sapply(VerbList,function(x){ paste(x,collapse=’,’)}) temp=which(VerbList==”) VerbList=VerbList[-temp] head(VerbList) write(VerbList,file=’C:\\Users\\folder\\Desktop\\VerbList_a’) verbWordList<- read.transactions(“C:\\Users\\folder\\Desktop\\VerbList_a”, format=”basket”,sep=”,”) rules <- apriori(verbWordList, parameter = list(support = 0.01,confidence = 0.01,minlen=2)) rules.sorted <- sort(rules, by=”support”) inspect(rules.sorted) #inspect(rules.sorted[1:5]) if(length(rules.sorted)>0){ rules.table=list(Keywords=lapply(1:length(rules.sorted), function(i){ wlist=do.call(‘c’,c(LIST(lhs(rules.sorted[i])),LIST(rhs(rules.sorted[i])))) }), quality=quality(rules.sorted)) } #changed … Read more

Everyday R code (9)

# word cloud is easy #we need cleanDescription.r file as follows. It was used to write some similar words into one word. For example, write games and gaming into game. You can write lots of them into it. require(tm) cleanDescription <- function(description,additional.stopwords=NULL) { # write to lower case description <- tolower(description) ##remove non-character symbols description … Read more

进口FOB

研究了一些产品,有商家给报价的时候特别说明是FOB,我去查了下什么是FOB。FOB(Free on Board)价格是说卖东西的人把货物送到船上的价格,就是说出口部分的程序已经cover,买家需要承担海上运输的风险和运到后的进口程序的费用,相当于在运到船上的时候,风险由卖家转给了买家。“在FOB条件下,卖方要领取出口许可证或其他官方证件,并负责办理出口手续,在成交时卖方还要自费提供证明其已按规定完成交货义务的证件,如果该证件并非运输单据,在买方要求下,并由买方承担风险和费用的情况下,卖方可以给予协助以取得提单或其他运输单据。” 在计算成本的时候,买家需要在FOB价格的基础上,加上自己要买的保险费用,从国外运到这里的运输费,还有入关的报关费用等。入关业务都有专门的报关公司可以承接,买家只需要交钱就可以了,当然需要准备很多的重要文件,保证进口合法,比如详细的产品介绍之类的,保证符合要求才行。另外,海关会检查,如果材料不齐全被查的几率会很大,一旦被查,费用动辄几千美金,都需要商家承担。进口商需要正规的公司注册,需要这些信息才能进口,年底需要报税。有些商品征收的税少,有些税多,都需要自己搞清楚。packing list, invoices, 提货单, 大概需要这些报关公司就可以帮着报关了。如果特殊商品需要特殊的文件,报关公司都会知道,问他们就好。 还有一种叫CIF(COST, INSRURANCE AND FREIGHT),就是卖方负责运费,保险费的费用,所以同样产品,这个价格条件的价格一般要高一些。 我打电话咨询了加州做货运的朋友,一般40 ft的柜子,如果从中国海运来,运费需要1700美元左右的样子。网上查到说一个20ft的柜子外围是8*8*20 ft, 可以放大概35个立方米,所以订货的时候也要问卖家买的东西大概一共是多少个立方,提前算好。 说起来很麻烦的样子,我想自己做一次就该很清楚了。不过我觉得要做的最好先找找当地的货源,进出口有时候还包含国家进出口退税什么的,比较麻烦,起步阶段风险越小越好,先从当地货源做起。

Everyday R code (8) logistic regression

# we have a data set with all numeric variables (some of the columns only have 0 or 1). And we want to run a logistic regression model. First we checked correlation and exclude the variables which has high correlation (remove one and keep the other). COR<-cor(Answered_data2) data<-subset(data,select=-c(A,C,D,AA,CC)) #we need to run correlation multiple times … Read more

Everyday R code (7)

#If some code does not work, we need to figure out if it is because some library is not loaded. Like filter, we need dplyr library to be loaded first. library(dplyr) T<-filter(P_HMD_TEST_PRED,Tabc==1) # we usually using conditions like this A<-filter(data, EE==1 & EE==2), how about we do not want the logic of and, we want … Read more

Everyday R code (6)

# R is simple to do data analyst work. If you want to write complicated loop and agrithem, that would be another story. Here is the code using R in a simple way.   #sometimes we need to install new packages, our R studio is running on the server. So it has a path for … Read more

Everyday R code (5)

#This is the most commonly used R code at work everyday. #read data from data.txt file and data2.txt file PP_C<-read.delim(“data1.txt”,header=T,sep=’\t’,comment.char=””,quote=””,row.names=NULL, stringsAsFactors= FALSE ) CES<-read.delim(“data2.txt”,header=T,sep=’\t’,comment.char=””,quote=””,row.names=NULL, stringsAsFactors= FALSE ) #check how many rows there nrow(PP_C) nrow(CES) #check how many unique COMM_ID there length(unique(PP_C$COMM_ID)) length(unique(CES$COMM_ID)) #check column names for each file colnames(PP_C) colnames(CES) #select specific columns to be … Read more