This survey focuses on some specific problems in distribution testing, taking goodness-of-fit as a running example. In particular, it does not aim to provide a comprehensive summary of all the topics in the area; but will provide self-contained proofs and derivations of the main results, trying to highlight the unifying techniques.
This survey [is meant] as an introduction and detailed overview of some topics in distribution testing, an area of theoretical computer science which falls under the general umbrella of property testing, and sits at the intersection of computational learning, statistical learning and hypothesis testing, information theory, and (depending on whom one asks) the theory of machine learning.There are several other resources you may want to read about this topic, starting with this short introductory survey by Taming big probability distributions XRDS: Crossroads, ACM (2012). or this other survey by, well, A Survey on Distribution Testing: Your Data is Big. But is it Blue? Theory of Computing (2020). This book differs from the previous ones in that it is (1) more recent, (2) more specific, focusing on a subset of questions and using them as guiding examples, instead of depicting as broad a landscape as possible (but from afar), (3) more detailed, including proofs and derivations, and (4) written with in mind the objective of putting the theoretical computer science, statistics, and information theory viewpoints together. Of course, I cannot promise I succeeded; but that was the intent, and you'll be the judge of the result.
@article{CanonneTopicsDT2022,
url = {http://dx.doi.org/10.1561/0100000114},
year = {2022},
volume = {19},
journal = {Foundations and Trends® in Communications and Information Theory},
title = {Topics and Techniques in Distribution Testing: A Biased but Representative Sample},
doi = {10.1561/0100000114},
issn = {1567-2190},
number = {6},
pages = {1032-1198},
author = {Clément L. Canonne}
}