Document metadata analysis tools
Our main documentation index lacks a page about metadata and MAT.
It should also mention pdf-redact-tools, tesseract-ocr, and ffmpeg. See #17178.
The homepage of MAT (https://mat.boum.org/) is pretty good at introducing what metadata are. So we should definitely point to it, and maybe reuse parts of it.
#4 Updated by ArgySan over 4 years ago
Let me know, is this way too simplistic?
What is MAT?
MAT (Metadata Anonymisation Toolkit) is a toolkit that anonymizes and removes metadata from your files. It does this utilizing a library, a GUI application and, if you prefer, a CLI application.
How does it work?
Simply put, MAT removes all metadata from files leaving them empty. Unfortunately, watermarks and steganographic tags won’t be removed but unlike metadata being added by default by many utilities, watermarks are not usually inadvertently added and the original author will likely be aware of their existence. Basically, MAT will protect you from accidental metadata leakage, but not customized metadata specifically included to track down you, the author.
Why do we care?
Because just about every file being uploaded to the internet contains metadata. From Office documents to .flac audio files and beyond, they all have metadata embedded, and that metadata tells the world where, when, and most crucially, who uploaded it. This defeats the purpose of Tails and our ‘privacy for anyone, anywhere’ mantra.
So, to ensure you stay anonymous, Tails comes with MAT included.
Currently supported files include:
- Portable Network Graphics (.png)
- JPEG (.jpg, .jpeg, etc)
- Open Documents (.odt, .odx, .ods, etc)
- Office OpenXml (.docx, .pptx, .xlsx, etc)
- Portable Document Fileformat (.pdf)
- Tape ARchives (.tar, .tar.bz2, etc)
- MPEG AUdio (.mp3, .mp2, .mp1, etc)
- Ogg Vorbis (.ogg, etc)
- Free Lossless Audio Codec (.flac)
- Torrent (.torrent)
MAT can be accessed via: Applications > System Tools in the Tails GUI.
#6 Updated by sajolida over 4 years ago
- QA Check set to Dev Needed
Hi, thanks a lot for working on this.
I don't think it's too simplistic and rather think that it should be further simplified :) You're providing the right information and it is well structured but here are a bunch of recommendations that will make you text even better:
- I would restructure the page to remove the first two titles and merge their content as an intro in a single block without title, like we do on /doc/advanced_topics/paperkey/.
- Explain metadata in simple words, cf. /doc/sensitive_documents/metadata/ and /doc/about/warning/. If you manage to only insert and not modify parts of /doc/sensitive_documents/metadata/, then I can maybe create an inline to include the same text in both places.
- Maybe some examples here would explain the best what are metadata. Maybe you can reuse stuff that's already on the MAT homepage: https://mat.boum.org/.
- I'm not sure about the need of the title for the section "Why do we care?". Maybe reusing stuff from those warnings might be enough. But we can see later.
- I like the list of supported formats. Maybe we could make it a bit more compact by having only one line for each of image formats, documents, audio formats.
- Maybe we can further simplify this list by listing only the corresponding file name extensions when they really bring something. For example, I'm not sure it's needed to specify for JPEG, Ogg Vorbis, Torrent, and maybe others. Now I see that you copied this list from MAT (which was a good idea). But maybe our objectives here are a bit different. In Tails we want to make our users understand as quickly as possible if MAT will work on their files, while in MAT they have to provide a complete and technically unambiguous list for reference.
- Can you find a title for the page looking at the style we usually use for other tools? That would go in the
- Check how we write the markup for menu paths in /contribute/how/documentation/guidelines/.
- Add a link to upstream (https://mat.boum.org/).
- We try to write our doc as consise as possible. It's ok to have as very few words on a page as long as they provide the information we need. People will get it faster and translators will have less work to do. In the light on this, I suggest you edit your text again once or two and look for words or parts of sentences that can be removed without loosing meaning. You can probably lower your total word count by 30% or 50% while still improving the quality of the text. For example, I could spot " and our ‘privacy for anyone, anywhere’ mantra", "So, to ensure you stay anonymous", " in the Tails GUI".
- Try to stay neutral regarding what people might or might not find "easy" or "simple". For example I would replace or get rid of "Simply put".
- Avoid jargon, use only words that people are likely to know already or that you want them to learn by providing an explanation. I'm not sure of "watermarks", "steganographic", "embedded" are likely to be understood. Remember than many (if not most) of our readers are not native English speakers.
- Avoid the future tense ("will") which most of the time introduces unecessary ambiguity about when the action will happen. For example "the original author will likely be aware" → "the original author is likely aware", "MAT will protect you" → "MAT protects you".
- Limit your sentence to 20-25 words in general. Even if providing the same content, breaking it into several sentences make it easier to read and understand.
- "Toolkit" → "tool"
- "Utilize" → "use", see https://en.wikipedia.org/wiki/Politics_and_the_English_Language "Never use a long word where a short one will do."
- Avoid abbreviations like "GUI" (maybe "graphical" or nothing will do) and "CLI" (see what we do on /doc/advanced_topics/paperkey/).
- If you want to go deeper into these consideration, the GNOME Documentation Style Guide is a very good start. See https://developer.gnome.org/gdp-style-guide/.
Good luck with all that :)
Once you're done you can assign the ticket to me and mark it's "QA Check" as "Ready for QA".
#10 Updated by u about 2 years ago
We actually do have a page about metadata which links to MAT in doc/sensitive_documents/metadata.mdwn
Based on this and on the work previously done here, we could make a nice documentation.
Anybody up for this, don't hesitate to work on this ticket.
The code of our wiki lives here: http://git-tails.immerda.ch/