主讲人 |
Wenxin Zhou |
简介 |
<p><span style="font-size: small;"><span lang="EN-US" style="font-family: 'Times New Roman', serif;">Abstract: Massive data are often contaminated by outliers and heavy-tailed errors. In the presence of heavy-tailed data, finite sample properties of the least squares-based methods, typified by the sample mean, are suboptimal both theoretically and empirically. To address this challenge, we propose the adaptive Huber regression for robust estimation and inference. The key observation is that the robustification parameter should adapt to sample size, dimension and moments for optimal tradeoff between bias and robustness. For heavy-tailed data with bounded (</span></span><span lang="EN-US" style="font-size: 10.5pt; font-family: Calibri, sans-serif; position: relative; top: 2.5pt;"><v:shapetype id="_x0000_t75" coordsize="21600,21600" o:spt="75" o:preferrelative="t" path="m@4@5l@4@11@9@11@9@5xe" filled="f" stroked="f"><v:stroke joinstyle="miter"><v:formulas><v:f eqn="if lineDrawn pixelLineWidth 0"><v:f eqn="sum @0 1 0"><v:f eqn="sum 0 0 @1"><v:f eqn="prod @2 1 2"><v:f eqn="prod @3 21600 pixelWidth"><v:f eqn="prod @3 21600 pixelHeight"><v:f eqn="sum @0 0 1"><v:f eqn="prod @6 1 2"><v:f eqn="prod @7 21600 pixelWidth"><v:f eqn="sum @8 21600 0"><v:f eqn="prod @7 21600 pixelHeight"><v:f eqn="sum @10 21600 0"></v:f></v:f></v:f></v:f></v:f></v:f></v:f></v:f></v:f></v:f></v:f></v:f></v:formulas><v:path o:extrusionok="f" gradientshapeok="t" o:connecttype="rect"><o:lock v:ext="edit" aspectratio="t"></o:lock></v:path></v:stroke></v:shapetype><v:shape id="_x0000_i1025" type="#_x0000_t75" style="width:24.75pt; height:12pt"><v:imagedata src="#" o:title="" chromakey="white"></v:imagedata></v:shape></span><span style="font-size: small;"><span lang="EN-US" style="font-family: 'Times New Roman', serif;">)-th moment for some </span></span><span lang="EN-US" style="font-size: 10.5pt; font-family: Calibri, sans-serif; position: relative; top: 2.5pt;"><v:shape id="_x0000_i1025" type="#_x0000_t75" style="width:25.5pt;height:12pt"><v:imagedata src="#" o:title="" chromakey="white"></v:imagedata></v:shape></span><span style="font-size: small;"><span lang="EN-US" style="font-family: 'Times New Roman', serif;">, we establish a sharp phase transition for robust estimation of regression parameters in both finite dimensional and high dimensional settings: when </span></span><span lang="EN-US" style="font-size: 10.5pt; font-family: Calibri, sans-serif; position: relative; top: 2.5pt;"><v:shape id="_x0000_i1025" type="#_x0000_t75" style="width:25.5pt;height:12pt"><v:imagedata src="#" o:title="" chromakey="white"></v:imagedata></v:shape></span><span style="font-size: small;"><span lang="EN-US" style="font-family: 'Times New Roman', serif;">, the estimator achieves sub-Gaussian rate of convergence without sub-Gaussian assumptions, while only a slower rate is available in the regime </span></span><span lang="EN-US" style="font-size: 10.5pt; font-family: Calibri, sans-serif; position: relative; top: 2.5pt;"><v:shape id="_x0000_i1025" type="#_x0000_t75" style="width:45pt;height:12pt"><v:imagedata src="#" o:title="" chromakey="white"></v:imagedata></v:shape></span><span style="font-size: small;"><span lang="EN-US" style="font-family: 'Times New Roman', serif;"> and the transition is smooth and optimal.</span> </span></p>
<p class="MsoNormal"><span style="font-size: small;"><span lang="EN-US" style="font-family: 'Times New Roman', serif;"> </span><span lang="EN-US" style="font-family: 'Times New Roman', serif;">In addition, non-asymptotic Bahadur representation and Wilks’ expansion for finite sample inference are derived when higher moments exist. Based on these results, we make a further step on developing uncertainty quantification methodologies, including the construction of confidence sets and large-scale simultaneous hypothesis testing. We demonstrate that the adaptive Huber regression, combined with the multiplier bootstrap procedure, provides a useful robust alternative to the method of least squares. The idea of adaptivity is to let data, which are probably collected with low quality and exhibit heavy tails, to influence the choice of method by which they are analyzed. Together, the theoretical and empirical results reveal the effectiveness of the proposed method, and highlight the importance of having statistical methods that are robust to violations of the assumptions underlying their use.</span></span><span lang="EN-US" style="font-family:"Times New Roman",serif"><o:p></o:p></span></p> |