Topics and Techniques in Distribution Testing

A Biased but Representative Sample

Clément Canonne University of Sydney
November, 2022

Abstract

This survey focuses on some specific problems in distribution testing, taking goodness-of-fit as a running example. In particular, it does not aim to provide a comprehensive summary of all the topics in the area; but will provide self-contained proofs and derivations of the main results, trying to highlight the unifying techniques.

What is this survey?

To quote and paraphrase the first chapter:

This survey [is meant] as an introduction and detailed overview of some topics in distribution testing, an area of theoretical computer science which falls under the general umbrella of property testing, and sits at the intersection of computational learning, statistical learning and hypothesis testing, information theory, and (depending on whom one asks) the theory of machine learning.

There are several other resources you may want to read about this topic, starting with this short introductory survey by Ronitt Rubinfeld, Taming big probability distributions XRDS: Crossroads, ACM (2012). or this other survey by, well, myself. A Survey on Distribution Testing: Your Data is Big. But is it Blue? Theory of Computing (2020). This book differs from the previous ones in that it is (1) more recent, (2) more specific, focusing on a subset of questions and using them as guiding examples, instead of depicting as broad a landscape as possible (but from afar), (3) more detailed, including proofs and derivations, and (4) written with in mind the objective of putting the theoretical computer science, statistics, and information theory viewpoints together. Of course, I cannot promise I succeeded; but that was the intent, and you'll be the judge of the result.

Who is this for?

This survey is intended for graduate (doctoral) students, advanced undergraduate students, and generally speaking any researcher interested in the topic with a solid background in theoretical computer science, information theory, and/or discrete mathematics.

Where can I get it?

The published version can be found on the Foundations and Trends® in Communications and Information Theory website; the full draft (with a much less fancy template, but a few minor typos corrected) is also freely available at this address.

How do I cite it?

You can use the following BibTeX snippet, provided by the publisher:

@article{CanonneTopicsDT2022,
	url = {http://dx.doi.org/10.1561/0100000114},
	year = {2022},
	volume = {19},
	journal = {Foundations and Trends® in Communications and Information Theory},
	title = {Topics and Techniques in Distribution Testing: A Biased but Representative Sample},
	doi = {10.1561/0100000114},
	issn = {1567-2190},
	number = {6},
	pages = {1032-1198},
	author = {Clément L. Canonne}
}

Are there solutions to the exercises?

Yes, here.

I have a question! Actually, more of a comment.

Comments, feedback, and questions about the book or the solutions are welcome. Feel free to contact me at clement.canonne@sydney.edu.au.

About the author

Clément Canonne Me is a Senior Lecturer in the School of Computer Science of the University of Sydney; he obtained his Ph.D. in 2017 from Columbia University, before joining Stanford as a Motwani Postdoctoral Fellow, then IBM Research as a Goldstine Postdoctoral Fellow. His main areas of research are distribution testing (and, broadly speaking, property testing) and learning theory; focusing, in particular, on understanding the computational aspects of learning and statistical inference subject to various resource or information constraints.