Transcription
Matrix Factorization1
Recovering latent factors in a matrixn rowsm columnsv11 vij vnm2
Recovering latent factors in a matrixK*my1a1a2. amx2y2b1b2 bm.n*Kx1 v11 vij xnyn vnm3
What is this for?K*my1a1a2. amx2y2b1b2 bm.n*Kx1 v11 vij xnyn vnm4
MF for collaborative @iltering5
What is collaborative @iltering?
What is collaborative @iltering?
What is collaborative @iltering?
What is collaborative @iltering?
Recovering latent factors in a matrixn usersm moviesv11 vij vnmV[i,j] user i’s rating of movie j11
Recovering latent factors in a matrixm moviesm moviesy1a1a2. amx2y2b1b2 bm.n usersx1 v11 vij xnyn vnmV[i,j] user i’s rating of movie j12
13
MF for image modeling14
15
MF for imagesPC1y1a1a2. amx2y2b1b2 bm.1000 imagesx11000 * 10,000,0010,000 pixels2 prototypes v11 vij xnynPC2 vnmV[i,j] pixel j in image i
MF for modeling text17
The Neatest Little Guide to Stock Market Investing Investing For Dummies, 4th Edition The Little Book of Common Sense Investing: The OnlyWay to Guarantee Your Fair Share of Stock MarketReturns The Little Book of Value Investing Value Investing: From Graham to Buffett and Beyond Rich Dad’s Guide to Investing: What the Rich Invest in,That the Poor and the Middle Class Do Not! Investing in Real Estate, 5th Edition Stock Investing For Dummies Rich Dad’s Advisors: The ABC’s of Real EstateInvesting: The Secrets of Finding Hidden Profits MostInvestors tent-semantic-analysis-lsa-tutorial/
TFIDF counts would be latent-semantic-analysis-lsa-tutorial/
Recovering latent factors in a matrixdoc term matrixm termsy1a1a2. amx2y2b1b2 bm.n documentsx1 v11 vij xnyn vnmV[i,j] TFIDF score of term j in doc i20
Investing forreal estateRich Dad’sAdvisor’s: TheABCs of RealEstateInvestment
The little bookof commonsense investing: Neatest LittleGuide to StockMarketInvesting
MF is like clustering24
k- ‐means as MFindicators for rclusters1a1a210b1b2.n examples0Zoriginal data setcluster meansM. am bm v11 vij xnynX vnm
How do you do it?K*my1a1a2. amx2y2b1b2 bm.n*Kx1 v11 vij xnyn vnm26
KDD 2011talk pilfered from à .27
28
Recovering latent factors in a matrixry1a1a2x2y2b1b2.n usersx1m moviesm moviesW xnynH. am bm v11 vijV vnmV[i,j] user i’s rating of movie j29
30
31
32
for image denoising33
Matrix factorization as SGDstep sizewhy does this work?34
Matrix factorization as SGD - ‐ why doesthis work? Here’s the key claim:35
Checking the claimThink for SGD for logistic regression LR loss compare y and ŷ dot(w,x) similar but now update w (user weights) and x (movie weight)36
What loss functions are possible?N1, N2 - ‐ diagonalmatrixes, sort of like IDFfactors for the users/movies“generalized” KL- ‐divergence37
What loss functions are possible?38
What loss functions are possible?39
ALS alternating least squares40
KDD 2011talk pilfered from à .41
42
43
44
Similar to McDonnell et alwith perceptron learning45
Slow convergence .46
47
48
49
50
51
52
More detail . Randomly permute rows/cols of matrix Chop V,W,H into blocks of size d x d– m/d blocks in W, n/d blocks in H Group the data:– Pick a set of blocks with no overlapping rowsor columns (a stratum)– Repeat until all blocks in V are covered Train the SGD– Process strata in series– Process blocks within a stratum in parallel53
More detail .Z was V54
More detail .M Initialize W,H randomly– not at zero J Choose a random ordering (random sort) of the pointsin a stratum in each “sub- ‐epoch” Pick strata sequence by permuting rows and columnsof M, and using M’[k,i] as column index of row i insubepoch k Use “bold driver” to set step size:– increase step size when loss decreases (in an epoch)– decrease step size when loss increases Implemented in Hadoop and R/Snowfall55
56
Wall Clock Time8 nodes, 64 cores, R/snow57
58
59
60
61
Number of Epochs62
63
64
65
66
Varying rank100 epochs for all67
Hadoop scalabilityHadoop processsetup time startsto dominate68
Hadoop scalability69
70
Rich Dad's Guide to Investing: What the Rich Invest in, That the Poor and the Middle Class Do Not! Investing in Real Estate, 5th Edition Stock Investing For Dummies Rich Dad's Advisors: The ABC's of Real Estate Investing: The Secrets of Finding Hidden Profits Most Investors Miss