Matrix Completion Methods For Causal Panel Data Models

Transcription

Matrix Completion Methodsfor Causal Panel Data ModelsSusan Athey, Mohsen Bayati,Nikolay Doudchenko, Guido Imbens, & Khashayar Khosravi(Stanford University)NBER Summer InstituteCambridge, July 27th, 2017

We are interested in estimating the (average) effect of abinary treatment on a scalar outcome. We have data on N units, for T periods. We observe– the treatment, Wit {0, 1},– the realized outcome Yit,– time invariant characteristics of the units Xi,– unit-invariant characteristics of time Zt,– time and unit specific characteristics Vit1

We observe (in addition to covariates): YYY12 Y13 11 Y21 Y22 Y23 Y31 Y32 Y33 . YN 1 YN 2 YN 3 W 101.1100.0011.1.100.0. . . Y1T. . . Y2T. . . Y3T. . . YN T outcome. treatment. rows are units, columns are time periods. (Importantbecause some, but not all, methods treat units and timeperiods asymmetric)2

In terms of potential outcomes: Y(0) ?X?.?XX.XX?.?.?XX.X Y(1) X?X.XX?.?XX.X.X?.? . In order to estimate the average treatment effect for thetreated, (or other average, e.g., overall average effect) P i,t Wit Yit (1) Yit (0)τ Pit Wit,We need to impute the missing potential outcomes in atleast one of Y(0) and Y(1).3

Focus on problem of imputing missing in Y (either Y(0) orY(1)) YN T ?X?.?XX.XX?.?.?XX.X O and M are sets of indices (it) with Yi,t observed and missing,with cardinalities O and M . Covariates, time-specific, unitspecific, time/unit-specific. This is a Matrix Completion Problem.4

General set up:YN T LN T εN T Key assumption “Matrix Unconfoundedness”:WN T εN TLN T(but W may depend on L) In addition:LN T UN RVT Rwell approximated by matrix with rank R low relative to Nand T .5

Classification of practical problems depending on– magnitude of T and N ,– pattern of missing data, fraction of observed data O /( O M ) close to zero or one. Different structure on L in– average treatment effect under unconfoundedness lit.– synthetic control literature– panel data / DID / fixed effect literature– machine learning literature6

Classification of Problem I: Magnitude of N and TThin Matrix (N large, T small), typical cross-section setting: YN T ?X?X?.?X?.?XXX?.X (many units, few time periods)Fat Matrix (N small, T large), time series setting: ? ? X X X . ? YN T X X X X ? . . . X (few units, many periods)? X ? X ? . XOr approx square matrix, N and T comparable magnitude.7

Classification of Problem II: Pattern of Missing DataMost of econometric causal literature focuses on case withblock of Treated Units / Time Periods YN T XXXXX.XXXXXX.XXXXXX.XXXX?.?.XXX?.? YC,pre(0) YC,post(0)YT,pre(0)?Easier because it allows for complete-data modeling of cond. distr. of YC,post(0) given YC,pre(0) (matching) or cond. distr. of YT,pre(0) given YC,pre(0) (synt. control).8!

Two important special cases:Single Treated Unit (Abadie et al Synthetic Control) YN T XXX.XXXX.XXXX.?.XXX.? (treated unit) Single Treated Period (Most of Treatment Effect Lit) YN T XXX.XXXX.XXXX.X.XX?.? (treated period) 9

Other Important Assignment PatternsStaggered Adoption (e.g., adoption of technology, Atheyand Stern, 1998) YN T XXXXX.XXXXXX.?XXX?.?XXX?.?.X?.? (never adopter)(late adopter) (medium adopter) (early adopter)10

Netflix Problem Very very large N (number of individuals), Large T (number of movies), raises computational issues General missing data pattern, Fraction of observed data is close to zero, O M YN T ?X?X?.?X?X.?.?.?X?.XX?X?.?.?X?X?.? 11

Fat Matrix: Vertical Regression Outcome: Target unit outcome in period t Covariates: Other unit’s outcomes in same period. Observation is a time period. What is stable: Patterns across units Identification: 𝑌𝑌𝑁𝑁,𝑡𝑡 0 𝑊𝑊𝑁𝑁,𝑡𝑡 𝑌𝑌1,𝑡𝑡 , . . , 𝑌𝑌𝑁𝑁 1,𝑡𝑡 Examples: Synthetic control: 𝜔𝜔𝑖𝑖 0, 𝑖𝑖 1 𝜔𝜔𝑖𝑖 1 Doudchenko-Imbens: estimate 𝜔𝜔𝑖𝑖 w/elastic net𝑦𝑦1,1 𝑦𝑦1,2 𝑦𝑦2,1 𝑦𝑦2,2 𝑦𝑦3,1 𝑦𝑦3,2 𝑦𝑦𝑁𝑁,1 𝑦𝑦𝑁𝑁,2 𝑦𝑦1,𝑇𝑇 1 𝑦𝑦1,𝑇𝑇𝑦𝑦2,𝑇𝑇 1 𝑦𝑦2,𝑇𝑇𝑦𝑦3,𝑇𝑇 1 𝑦𝑦3,𝑇𝑇𝑦𝑦𝑁𝑁,𝑇𝑇 1 ?𝑦𝑦𝑁𝑁,𝑡𝑡 𝜔𝜔0 𝑖𝑖 𝑁𝑁 𝜔𝜔𝑖𝑖 𝑦𝑦𝑖𝑖,𝑡𝑡 𝜖𝜖𝑡𝑡𝑦𝑦 𝑁𝑁,𝑇𝑇 𝜔𝜔 𝑖𝑖 𝑦𝑦𝑖𝑖𝑖𝑖𝑖𝑖 1

Thin Matrix: Horizontal Regression Outcome: Target time period outcome Covariates: Other time period outcome for same unit Observation is a unit. What is stable: Time patterns within a unit Identification: 𝑌𝑌𝑖𝑖,𝑇𝑇 0 𝑊𝑊𝑖𝑖,𝑇𝑇 𝑌𝑌𝑖𝑖,1 , . . , 𝑌𝑌𝑖𝑖,𝑇𝑇 1 Examples: Matching, ATE literature: avg. outcomesfrom units with most similar 𝒚𝒚𝑖𝑖, 𝑇𝑇 With regularization: Chernozhukov et al,Athey, Imbens and Wager (2017) Closely related to transposed versions ofSynthetic Controls, Elastic Net𝑦𝑦1,1 𝑦𝑦1,2 𝑦𝑦1,𝑇𝑇𝑦𝑦2,1 𝑦𝑦2,2 𝑦𝑦2,𝑇𝑇𝑦𝑦3,1 𝑦𝑦3,2 𝑦𝑦3,𝑇𝑇 𝑦𝑦𝑁𝑁,1 𝑦𝑦𝑁𝑁,2 ?𝑦𝑦𝑖𝑖,𝑇𝑇 𝑡𝑡 𝑇𝑇 𝜔𝜔𝑡𝑡 𝑦𝑦𝑖𝑖,𝑡𝑡 𝜖𝜖𝑖𝑖𝑦𝑦 𝑁𝑁,𝑇𝑇 𝜔𝜔 𝑡𝑡 𝑦𝑦𝑁𝑁,𝑡𝑡𝑡𝑡 𝑇𝑇

General Matrix: Matrix Regression (Panel)𝑦𝑦1,1 𝑦𝑦1,𝑇𝑇𝑦𝑦2,1 𝑦𝑦2,𝑇𝑇𝑦𝑦3,1 𝑦𝑦3,𝑇𝑇 𝑦𝑦𝑁𝑁,1 ?𝑌𝑌𝑁𝑁 𝑇𝑇 𝐿𝐿𝑁𝑁 𝑇𝑇 𝜖𝜖𝑁𝑁 𝑇𝑇𝛾𝛾1 𝛾𝛾𝑁𝑁𝑦𝑦𝑖𝑖,𝑡𝑡 𝛾𝛾𝑖𝑖 𝛿𝛿𝑡𝑡 𝜖𝜖𝑖𝑖,𝑡𝑡𝑦𝑦 𝑁𝑁𝑁𝑁 𝛾𝛾 𝑁𝑁 𝛿𝛿̂𝑇𝑇11 𝛿𝛿1 1 1 𝜖𝜖𝑁𝑁 𝑇𝑇𝛿𝛿𝑇𝑇 Panel data regression. Exploit additive structure in unit andtime effects. Identification: 𝑌𝑌𝑖𝑖,𝑡𝑡 (0) 𝑊𝑊𝑖𝑖,𝑡𝑡 𝛾𝛾𝑖𝑖 , 𝛿𝛿𝑡𝑡 Matrix formulation of identification: 𝑌𝑌𝑁𝑁 𝑇𝑇 (0) 𝑊𝑊𝑁𝑁 𝑇𝑇 𝐿𝐿𝑁𝑁 𝑇𝑇

II. How/why do we regularize:Potentially many parameters when (i) vertical regression onthin matrix, (ii) horizontal regression on fat matrix, iii) matrix is approx square:“Regularization theory was one of the first signs ofthe existence of intelligent inference.” (Vapnik, 1999,p. 9) Need regularization to avoid overfitting. How you do the regularization is important for substantive and computational reasons: lasso/elastic-net/ridge arebetter than best subset in simple regression setting.13

Literature:Regularize No Regular.BestSubset 1/LASSO, 2 et alearlier causaleffect lit.–Chernozhukov et alAthey et alVerticalAbadie-Diam.,Hainmueller–Doudch.-Imb.& Abadie-L’HourMatrixtwo-wayfixed effectliteratureBai (2003)Xu (2017)Current PaperRegression Horizontal14

Econometric Literature I: Treatment Effect / MatchingRegression Thin matrix (many units, few periods), single treated period(period T ).Strategy: Use controls to regress Yi,T on lagged outcomesYi,1, . . . , Yi,T 1. NC obs, T 1 regressors. Does not work well if Y is fat (few units, many periods). Key identifying assumption: YiT (0) WiT Yi1, . . . , YiT 115

Econometric Literature II: Abadie-Diamond-HainmuellerSynthetic Control Literature Fat matrix, single treated unit (unit N ), treatment startsin period T0.Strategy: Use pretreatment periods to regress YN,t on contemporaneous outcomes Y1,t, . . . , YN 1,t. T0 1 obs, N regressors. Weights (regression coefficients) are nonnegativeand sum to one, no intercept. Does not work well if matrix is thin (many units). Key identifying assumption: YN t(0) WN t Y1t, . . . , YN 1t16

Econometric Literature III: Doudchenko-Imbens Fat matrix or similar N , T , single treated unit (unit N ),treatment starts in period T0.Strategy: Use pretreatment periods to regress YN,t on contemporaneous outcomes Y1,t, . . . , YN 1,t. using elastic netregularization. T0 1 obs, N regressors. Allows for negative weights, weights summing to something other than one, non-zero intercept, typically requiresregularization.17

Econometric Literature IV: Transposed Abadie-DiamondHainmueller or Doudchenko-Imbens (Reverse role of timeand units compared to ADH or DI) fat matrix, single treated unit (N ), treatment in period T .Strategy: Use control units to regress YiT on lagged outcomes Yi1, . . . , YiT 1. using elastic net regularization. NCobs, T 1 regressors. Allows for negative weights, weights summing to somethingother than one, non-zero intercept.Similar to regression estimator for matching setting,with regularization.19

Econometric Literature V: Fixed Effect Panel Data Literature / Difference-In-Differences T and N similar, general pattern for treatment assignment.Model:Yit αi γt εi,t Symmetric in role of units and time periods. Suppose T 2, N 2, W2,2 1, Wi,t 0 if (i, t) 6 (2, 2),then we have a classic DID setting, leading to imputed value Ŷ2,2 Y1,2 Y2,1 Y1,1 20

Questions: What to do if we are unsure about thin/fat/square,with staggered adoption or general assignment mechanism? We generalize interactive fixed effects model (Bai, 2003,2009; Xu 2017, Gobillon and Magnac, 2013; Kim andOka, 2014), allowing for large rank L. We propose a new estimator with novel regularization:– can deal with staggered/general missing data patterns– Computationally feasible bec. convex optimization probl.– Reduces to matching under assump. in thin case.– Reduces to synt. control under assump. in fat case.21

Xi is P -vector, Zt is Q vector.Model (generalized version of Xu, 2017):Yit Lit QP XXXipHpq Zqt γi δt Vitβ εitp 1 q 1Unobserved: Lit, γi, δt, Hpq , β, εit.22

We do not necessarily need the fixed effects γi and δt, thesecan be subsumed into L.If Lit γi δt, then L is a rank 2 matrix:L γN 1 ιN 1 ιT 1 δT 1 γ1 1γ2 1 . . γN 11 1 . 1δ1 δ . . . δ T! It may be convenient to include the fixed effects given thatwe regularize L.23

Too many parameters (especially N T matrix L), so weneed regularization:We shrink L and H towards zero.For H we use Lasso-type element-wise 1 norm: defined asPPQkHk1,e Pp 1 q 1 Hpq .24

How do we regularize LN T ?In linear regression with many regressors,Yi KXβk Xik εi,l 1we often regularize by adding a penalty term λkβk wherekβk kβk0 KXk 1kβk kβk1 KX1 βk 6 0best subset selection βk LASSO βk 2ridgek 1kβk kβk22 KXk 125

Matrix norms for N T Matrix LN TLN T SN N ΣN T RT T(singular value decomposition)S, R unitary, Σ is rectang. diagonal with entries σi(L) thatare the singular values. Rank(L) is # of non-zero σi(L).kLk2F X Lit 2 min(N,TX )i,tkLk σi2(L)(Frobenius, like ridge)j 1min(N,TX )σi(L)(nuclear norm, like LASSO)j 1kLkR rank(L) min(N,TX )j 11σi(L) 0(Rank, like subset)26

Xu (2017) focuses on case with block assignment,Y YC,pre YC,postYT,pre?!Following Bai (2009), Xu fixes the rank R(L) so we can writeL as a matrix with an R-factor structure:L UV UCUT!VpreVpost! whereU is N R,V is T R27

Xu (2017) two-step method:First, use all controls to estimate UC , Vpre, Vpost:minUC ,Vpre,VpostYC UCVpreVpost! Second, use the treated units in pre period to estimate UTgiven V̂pre: min YT,pre UT V̂preUTChoose rank of L through crossvalidation (equivalent to regularization through rank).28

Two Issues Xu’s approach does not work with staggered adoption (theremay be only few units who never adopt), or general assignment pattern. Xu’s method is not efficient because it does not use theYT,pre data to estimate V.29

Modified Xu (2017) method:1 Xmin(Yit Lit)2 λLkLkRL O (i,t) O More efficient, uses all data. Works with staggered adoption and general missing datapattern. Computationally intractable with large N and T because ofnon-convexity of objective function (like best subset selectionin regression).30

Our proposed method: regularize using using nuclear norm:1 XL̂ min(Yit Lit)2 λLkLk L O (i,t) O The nuclear norm k·k generally leads to a low-rank solutionfor L, the way LASSO leads to selection of regressors. Problem is convex, so fast solutions available.31

Estimation: L̂ is obtained via the following procedure :(1) Initialize L̂1 by 0N T .(2) For k 1, 2, . . . repeat till convergence (O is where weobserve Y): L̂k 1 Shrinkλ PO(Y) PO (L̂k ) Here PO , PO , and Shrinkλ are matrix operators on RN T . For any AN T ,PO (A) is equal to A on O and is equal to 0 outside of O. PO (A) is theopposite; it is equal to 0 on O and is equal to A outside of O.For SVD A SΣR0 with Σ diag(σ1 , . . . σmin(N,T ) ),Shrinkλ (A) S diag(σ1 λ, . . . , σ λ, 0, . . . , 0 ) R0 . {z }min(N,T ) where σ is the smallest singular value of A that is larger than λ. Moredetails in Mazumder, Hastie, and Tibshirani (2010)32

General Case: We estimate H, L, δ, γ, and β as 2X 1 X Yit Lit Xip Hpq Zqt γi δt Vit β min λLkLk λH kHk1,e H,L,δ,γ O 1 p P(i,t) O 1 q Q The same estimation procedure as before applies here withan additional Shrink operator for H. We choose λL and λH through crossvalidation.33

Additional Generalizations I: Allow for propensity score weighting to focus on fit whereit matters:Model propensity score Eit pr(Wit 1 Xi, Zt, Vit), E is N Tmatrix with typical element EitPossibly using matrix completion:1 Xmin(Wit Eit)2 λLkEk E N T i,tand thenÊit1 Xmin(Yit Lit)2 λLkLk L O 1 Êit(i,t) O34

Additional Generalizations II: Take account of of time series correlation in εit Yit LitModify objective function from logarithm of Gaussian likelihood based on independence to have autoregressive structure.35

Adaptive Properties of Matrix Regression ISuppose N is large, T is small, Wit 0 if t T (ATE underunconf setting), and the data-generating-process isYiT µ TX 1αtYit εiT ,εiT (Yi1, . . . , Yi,T 1)t 1Then matrix regression horizontal regression, and γi 0,δ (0, 0, . . . , µ), and rank T 1 matrix LY 11 Y 21 Y31 . Y12Y22Y32.Y N 1 YN 2.P 1µ TαtY1tt 1P 1µ TαtY2tPt 1 1µ Tt 1 αt Y3tY1,T 1Y2,T 1Y3,T 1.PT 1YN,T 1 µ t 1 αtYN t (rank T 1)36

Adaptive Properties of Matrix Regression IISuppose N is small, T is large, single treated unit, (syntheticcontrol setting) and the data-generating-process isYN t µ NX 1αiYit εN t,εN t (Y1t, . . . , YN 1,t)i 1Then matrix regression vertical regression, and γi (0, 0, . . . , µ),δ 0, and rank N 1 matrix L Y11Y21Y31.Y12Y22Y32.Y1T.Y2T.Y3T.YN 1,1YN 1,2.YN 1,TPN 1PN 1PN 1µ i 1 ωiYi1 µ i 1 ωiYi2 . . . µ i 1 ωiYiT37

Results I: If there are no covariates (just L), O is sufficientlyrandom, and εit Yit Lit are iid with variance σ 2.qP2Recall kYkF i,t Yit and kY k maxi,t Yit .Let Y be the matrix including all the missing values; e.g.,Y(0). Our estimate Ŷ for Y is L̂.The estimated matrix Ŷ is close to Y in the following sense :Ŷ Y kY kFF C maxkY k σ,kY kF!rank(L)(N T ) ln(N T ). O Often the number of observed entries O is of order N Tso if rank(L) min(N, T ) and kY k /kY kF , as N Tgrows, the error goes to 0. Adaptingthe analysis of Negahban and Wainwright (2012)38

Results IITo get confidence interval for Yit(1 Yit(0) (for treated unitwith Wit 1), we need confidence interval for Lit and distributional assumption on εit Yit(0) Lit (e.g., normal,N(0, σ 2)). To estimate Lit consistently, and have distributional results,we need N and T to be large (even when rank(L) 1). We assume LN T is a rank R matrix, R fixed as N , Tincrease. (Can probably be relaxed to let R increase slowly.)39

Large sample properties of L̂it, following Bai (2003). Decompose L as a rank R matrix:LN T UN RVT0 RDefineΣU 1 U UN 1 2 1Ωi U ΣiU σ ΣU UiΣV 1 V VT2 Σ 1 VΨt Vt Σ 1σtVVThen s 1ΩiΨt NT dL̂it Lit N (0, 1)40

Illustrations To assess root-mean-squared-error, not to get point estimate. We take a complete matrix Y, drop some entries andcompare imputed to actual values. We compare five estimators DID SC-ADH (Abadie-Diamond-Hainmueller) EN (Elastic Net, Doudchenko-Imbens) EN-T (Elastic Net Transposed, Doudchenko-Imbens) MC-NNM (Matrix Completion, Nuclear-Norm Min)41

Illustration I California Smoking ExampleTake Abadie-Diamond-Hainmueller California smoking data.Consider two settings: Case 1: Simultaneous adoption42

Illustration I California Smoking ExampleTake Abadie-Diamond-Hainmueller California smoking data.Consider two settings: Case 2: Staggered adoptionWe report average RMSE for different ratios T0/T .43

Illustration I California Smoking Example (N 38, T 31)Simultaneous adoption, Nt 8Staggered adoption, Nt 3544

Illustrations II Stock Market DataDaily returns on 2400 stocks, for 3000 days. We pick Nstocks at random, for first T periods. This is our sample.We then pick bN/2c stocks at random from the sample, consider the simultaneous adoption case with T0 in {b0.25T c, b0.75T c},impute the missing data and compare to actual data.We repeat this 5 times for two pairs of (N, T ): (N, T ) (1000, 5) (thin) and (N, T ) (5, 1000) (fat).45

Illustrations II Stock Market DataThin: (N, T ) (1000, 5)Fat: (N, T ) (5, 1000)46

ReferencesAbadie, Alberto, Alexis Diamond, and Jens Hainmueller. ”Synthetic control methods for comparative case studies: Estimating the effect of Californias tobacco control program.”Journal of the American statistical Association 105.490 (2010):493-505.Abadie, Alberto, Alexis Diamond, and Jens Hainmueller. ”Comparative politics and the synthetic control method.” American Journal of Political Science 59.2 (2015): 495-510.Abadie, Alberto, and Jeremy L’Hour “A Penalized SyntheticControl Estimator for Disaggregated Data”Athey, Susan, Guido W. Imbens, and Stefan Wager. Efficientinference of average treatment effects in high dimensions via47

approximate residual balancing. arXiv preprint arXiv:1604.07125v3(2016).Bai, Jushan. ”Inferential theory for factor models of largedimensions.” Econometrica 71.1 (2003): 135-171.Bai, Jushan. ”Panel data models with interactive fixed effects.” Econometrica 77.4 (2009): 1229-1279.Bai, Jushan, and Serena Ng. ”Determining the number offactors in approximate factor models.” Econometrica 70.1(2002): 191-221.Candés, Emmanuel J., and Yaniv Plan. ”Matrix completionwith noise.” Proceedings of the IEEE 98.6 (2010): 925-936.

Candés, Emmanuel J., and Benjamin Recht. ”Exact matrixcompletion via convex optimization.” Foundations of Computational mathematics 9.6 (2009): 717.Chamberlain, G., and M. Rothschild. ”Arbitrage, factor structure, and mean-variance analysis on large asset markets. Econometrica 51 12811304, 1983.Chernozhukov, Victor, et al. ”Double machine learning fortreatment and causal parameters.” arXiv preprint arXiv:1608.00060(2016).Doudchenko, Nikolay, and Guido W. Imbens. Balancing, regression, difference-in-differences and synthetic control methods: A synthesis. No. w22791. National Bureau of Economic Research, 2016.

Gross, David. “Recovering Low-Rank Matrices From FewCoefficients in Any Basis”, IEEE Transactions on InformationTheory, Volume: 57, Issue: 3, March 2011Gobillon, Laurent, and Thierry Magnac. ”Regional policyevaluation: Interactive fixed effects and synthetic controls.”Review of Economics and Statistics 98.3 (2016): 535-551.Keshavan, Raghunandan H., Andrea Montanari, and SewoongOh. “Matrix Completion from a Few Entries.” IEEE Transactions on Information Theory, vol. 56,no. 6, pp.2980-2998,June 2010Keshavan, Raghunandan H., Andrea Montanari. and Sewoong Oh. “Matrix completion from noisy entries.” Journalof Machine Learning Research 11.Jul (2010): 2057-2078.

Kim,D., AND T.Oka (2014):“Divorce Law Reforms and Divorce Rates in the USA: An Interactive Fixed-Effects Approach, Journal of Applied Econometrics, 29, 231245 (2014).Mazumder, Rahul and Hastie, Trevor and Tibshirani, Rob.“Spectral Regularization Algorithms for Learning Large Incomplete Matrices”, Journal of Machine Learning, 11 22872322 (2010).Moon, Hyungsik Roger, and Martin Weidner. ”Linear regression for panel with unknown number of factors as interactivefixed effects.” Econometrica 83.4 (2015): 1543-1579.Liang, Dawen, et al. ”Modeling user exposure in recommendation.” Proceedings of the 25th International Conferenceon World Wide Web. International World Wide Web Conferences Steering Committee, 2016.

Negahban, Sahand and Wainwright, Martin. “Estimationof (near) low-rank matrices with noise and high-dimensionalscaling”, Annals of Statistics, Vol 39, Number 2, pp. 1069–1097.Negahban, Sahand and Wainwright, Martin. “Restrictedstrong convexity and weighted matrix completion: Optimalbounds with noise” Journal of Machine Learning Research,13: 1665-1697, May 2012Recht, Benjamin. “A Simpler Approach to Matrix Completion”, Journal of Machine Learning Research 12:3413-3430,2011Xu, Yiqing. ”Generalized Synthetic Control Method: CausalInference with Interactive Fixed Effects Models.” PoliticalAnalysis 25.1 (2017): 57-76.

Susan Athey, Mohsen Bayati, Nikolay Doudchenko, Guido Imbens, & Khashayar Khosravi (Stanford University) NBER Summer Institute Cambridge, July 27th, 2017 We are interested in estimating the (average) e ect of a binary treatment on a scalar outcome. We have data on N units, for T periods. We observe