Divide and Conquer Algorithms for Faster Machine Learning
General Material Designation
[Thesis]
First Statement of Responsibility
Izbicki, Michael
Subsequent Statement of Responsibility
Shelton, Christian R
.PUBLICATION, DISTRIBUTION, ETC
Date of Publication, Distribution, etc.
2017
DISSERTATION (THESIS) NOTE
Body granting the degree
Shelton, Christian R
Text preceding or following the note
2017
SUMMARY OR ABSTRACT
Text of Note
This thesis improves the scalability of machine learning by studying mergeable learning algorithms. In a mergeable algorithm, many processors independently solve the learning problem on small subsets of the data. Then a master processor merges the solutions together with only a single round of communication. Mergeable algorithms are popular because they are fast, easy to implement, and have strong privacy guarantees. Our first contribution is a novel fast cross validation procedure suitable for any mergeable algorithm. This fast cross validation procedure has a constant runtime independent of the number of folds and can be implemented on distributed systems. This procedure is also widely applicable. We show that 32 recently proposed learning algorithms are mergeable and therefore fit our cross validation framework. These learning algorithms come from many subfields of machine learning, including density estimation, regularized loss minimization, dimensionality reduction, submodular optimization, variational inference, and markov chain monte carlo. We also provide two new mergeable learning algorithms. In the context of regularized loss minimization, existing merge procedures either have high bias or slow runtimes. We introduce the optimal weighted average (OWA) merge procedure, which achieves both a low bias and fast runtime. We also improve the cover tree data structure for fast nearest neighbor queries by providing a merge procedure. In doing so, we improve both the theoretical guarantees of the cover tree and its practical runtime. For example, the original cover tree was able to find nearest neighbors in time