Σεμινάριο: "Subset selection for big data regression: an improved approach"
ΚΥΚΛΟΣ ΣΕΜΙΝΑΡΙΩΝ ΣΤΑΤΙΣΤΙΚΗΣ ΦΕΒΡΟΥΑΡΙΟΣ 2022
Ομιλητής: Vasilis Chasiotis (Department of Statistics, AUEB, GR)
Subset selection for big data regression: an improved approach
ΠΕΡΙΛΗΨΗ:
In the big data era researchers face a series of problems. Such big data occur in several cases. Even standard approaches/methodologies like linear regression can be difficult or problematic with huge volumes of data. For example, traditional approaches for regression in big datasets may suffer due to the large sample size, since they involve inverting huge data matrices or even because the data cannot fit to the memory. Among others, a simple approach may be based on selecting subdata to run the regression. Some approaches for big data regression, already existing in the current literature, are based on selecting data points using information criteria, providing algorithms as well. Some of these approaches are based on the combinatorial properties of an orthogonal array. In the present paper we wish to improve the algorithms proposed in these approaches. We describe an approach, providing a new algorithm whose gain is shown through simulation experiments and analysis of real data. A discussion about the parameters of the proposed algorithm is also provided in order to clarify the trade-offs between execution time and information gain.
Αίθουσα Τ102, Νέο Κτίριο Οικονομικού Πανεπιστημίου, Τροίας 2
(Μπορείτε να δείτε pdf της παρουσίασης εδώ)