`add` no longer locks a mutex, and `finish` no longer locks a mutex
except for the last task. This could meaningfully improve performance in
cases where we spawn a large number of tasks on a thread pool. This
change doesn't alter the semantics of the type, unlike the standard
library's `WaitGroup`, which also uses atomics but has to be explicitly
reset.
(For internal tracking: fixes ENG-19722)
---------
Co-authored-by: Jarred Sumner <jarred@jarredsumner.com>