海外华人红娘

Deep learning from Andrew Ng, recommended

November 20, 2017November 9, 2017 by lilo

link to youtube tutorial

Everyday R Code (19) – A function example to compare loops and vectorization speed

September 13, 2017 by lilo

#to compare speed of 2 methods: loops and vectorization n=10 #generate 100 numbers between 1 and 1000, and then make a 10 by 10 matrix A=matrix(runif(100,1,1000),nrow=n,ncol=n) B=matrix(runif(100,1,100),nrow=n,ncol=n) #method 1 A%*%B #get system.time to compare with the other method system.time(A%*%B) #method 2 #using a function MultiplyMatrices=function(A,B,n){ R=matrix(data=0,nrow=n,ncol=n) for (i in 1:n) for (j in … Read more

Everyday R Code (18) – matrix calculation

September 7, 2017 by lilo

Calculation1 Subtract different value from multiple columns b <- matrix(rep(1:20), nrow=4, ncol=5) c <- c(1,2,4) b c for(i in 1:nrow(b)) { b[i,3:5] <- b[i,3:5] – c } b Calculation 2 Subtract matrix from matrix from multiple columns b <- matrix(rep(1:20), nrow=4, ncol=5) d<(rep(2:21), nrow=4, ncol=5) b d for(i in 1:nrow(b)) { b[i,3:5] … Read more

Interview：Concordant, Discordant and Tied Pairs for model validation

August 31, 2017 by lilo

What are Concordant, Discordant and Tied Pairs for model validation? A friend who was interviewed by Amazon for a data related position was asked about this question. Here is a very clear solution for this question. http://www.listendata.com/2014/08/modeling-tips-calculating-concordant.html 最基本的是把1的放一组（有a个），把0的放一组（有b 个），做笛卡尔积（cartesian product）得到aXb对儿数据。然后看每一对儿，把对应该是1的和应该是0的预测出来的数值做比较，如果应该是1 的大于应该是0的，叫concordance pair, 如果应该是0的大于应该是1的，叫discordance pair, 如果相等就叫tied pair。好的model的特点：越多的concordant pairs，越少的discordant and tied pairs 一般concordant pairs占80%以上的比例比较好。 … Read more

Everyday R code (17) Pivot table in R to replace excel

August 30, 2017 by lilo

Pivot table in R to replace excel Step1 #converting data format using data.table library(data.table) live<-data.table(live) Step2 #finding unique count for each bucket using list uniqueData<-live[,list(Unique_user_Count=length(unique(User_ID))),by=list(Market, Company,Group)] Step3 #pivot the table using dcast function in reshape2 package #install the package if you haven’t install.packages(“reshape2″) library(reshape2) pivot<-dcast(uniqueData, Market+Company ~ Group , value.var=”Unique_user_Count”, fun.aggregate=sum) … Read more

Everyday SQL (7)

August 26, 2017August 21, 2017 by lilo

There are many interesting usage of sql function, like case when which is super powerful. Hope you can find something interesting or useful from the code below. It’s used in the real working environment to fill a business data request. with vvvos as ( select distinct map.ccc_o_group_id os from base.dddd_o_mapping map inner join base.dddd_ccc_o_groups groups … Read more

Everyday SQL (6)

August 26, 2017May 31, 2017 by lilo

这是我工作遇到的问题，从别的组里要来了一堆SQL code，大概是四五年前在那工作的人写的，感觉是old style，之前从没见过，贴出来给大家。 Q: What’s the meaning of (+) in SQL queries (Oracle)? A: It’s Oracle’s synonym for OUTER JOIN. Example： SELECT * FROM a, b WHERE b.id(+) = a.id gives the same result as SELECT * FROM a LEFT OUTER JOIN b ON b.id = a.id Or we can … Read more

Spyder did not work any more from Anaconda (02/18/2017). Here is the solution.

February 19, 2017 by lilo

My spyder did not work any more from Anaconda. Here is the solution based on this post. Open your Terminal and use the command below. (there is a file called .spyder or something like this. This file might be broken. So we can rename it first, then open spyder from Anaconda to generate a new … Read more

latest procedure to install python and packages (Jan.27, 2017)

January 27, 2017 by lilo

Happy Chinese New Year! I have to use a search tool to get information and it’s complicated. I need python to finish this task. (I still did not figure it out with help of SDE and DS friends who are good at coding. I’ll need my husband’s help. 🙂 ) I tried on Windows and … Read more

Python question list

December 15, 2016 by lilo

if we have probability for each value, how to see the value’s distribution? We can use histgram to see the possible pdf (probability density function) overall. Then using KDE (Kernel Density Estimation) to fit pdf. Reference as follows. https://jakevdp.github.io/blog/2013/12/01/kernel-density-estimation/ Do you use random forest? If so, what’s entropy? To measure the quality of a split: … Read more