Privacy-preserving way of counting Tails installations
Originally created by @sajolida on #16535 (Redmine)
Inspiration and design
Cf. how Fedora does it:
- https://fedoraproject.org/wiki/Changes/DNF_Better_Counting
- the initial "countme" idea by Lennart Poettering, that lead to the "DNF Better Counting" proposal, i.e. with no need for UUIDs: https://lwn.net/ml/fedora-devel/20190108152239.GA24118@gardel-login/
- The high-level story: https://lwn.net/Articles/776327/
And another UUID-less design (that was not implemented): https://wiki.mozilla.org/MetricsDataPing#Anonymous_alternative
For Tails, we could send a “countme” along with the fetch of the
security update feed every time a persistence is mounted in a new month,
based on the “Last mount time” of /dev/mapper/TailsData_unlocked
.
This would allow us to count the unique installations of Tails with persistence used in a given month (or week) in a privacy-preserving way.
It would complete the picture that we already have on:
- Boot counts
- Downloads (#14922 (closed))
Specs
- Granularity per week
- It's OK if this does not give us accurate with/without persistence ratio: we have data about the portion of Tails users with persistence elsewhere already (WhisperBack reports).
Combine this with reporting version number
Combining this with reporting version number (#17545) would have the following advantages:
- Single requests with both information
- Only once per week if Persistent Storage
- Every time if no Persistent Storage ⇒ we can't de-duplicate those boots from the same device.
- Additional bit in the encrypted data
- Single tooling to parse and extract data
- The resulting data is more useful, e.g. gives us how many installs are outdated, not how many boots were done from an outdated Tails.
Next steps
-
draft design based on sysadmins' proposal for #17545 -
ask sysadmins to review the design and improve like the smart security/privacy-minded people they are, rinse and repeat -
implement: Tails-client-side code [FT] + parsing code [sajolida] -
deploy in Tails -
iterate at will
And optionally, figure out how maybe some day the data processing + output publication could be done automatically on our infra, instead of manually by 1 person who first has to get access to tons of logs.