Project

General

Profile

Bug #9705

Feature #7563: Update the automated test suite for Jessie ISO images

Update the memory erasure automated tests for Jessie

Added by intrigeri about 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
Test suite
Target version:
Start date:
07/08/2015
Due date:
01/15/2016
% Done:

100%

Feature Branch:
feature/jessie
Type of work:
Code
Blueprint:
Starter:
Affected tool:

Description

I see two of its scenarios fail due to insufficient patterns coverage after filling RAM with a known pattern. We should first check that we correctly measure this (e.g. I've fixed some of it with 505297c already, and there may be more to update). If we don't, then fix it. If we do, then assess whether it's expected that on Jessie, less patterns are found.


Related issues

Related to Tails - Bug #10487: Improve VM and OOM settings for erasing memory Resolved 11/05/2015
Blocked by Tails - Bug #8317: fillram isn't effective for filling the memory in Jessie Resolved 11/26/2014

Associated revisions

Revision e901c7f9 (diff)
Added by anonym almost 4 years ago

Rework how we measure pattern coverage.

The approaches we used before just seems incorrect in multiple ways:

  • vm.overcommit_memory = 2 seems to make OOM killing happen way too
    early, around when 15% or RAM is still free, no matter the value of
    vm.overcommit_ratio and the various *reserved settings.
  • Using the ratio of a ration for the {min,max}_coverage calculations
    were just weird and a sign of incremental bandaid-like fixes.

Now we do what we intended more explicitly, that is: measure the
pattern coverage in the free RAM at the time we start filling it. We
also have to reserve some RAM for the kernel (so it doesn't freeze
when approaching full RAM) and admin processes like the remote shell
(so it doesn't get unresponsive), and we exclude this from the free
RAM in the calculations for a much more accurate result than before.

Note that we may end up having a bit more than 100% pattern
coverage. While that may seem strange it may just indicate that some
unrelated RAM was freed after we noted the amount of free RAM right
before we started filling it.

Will-fix: #9705

History

#1 Updated by intrigeri about 4 years ago

  • Parent task set to #7563

#2 Updated by intrigeri about 4 years ago

  • Tracker changed from Feature to Bug

#3 Updated by intrigeri about 4 years ago

  • Due date set to 01/15/2016

#6 Updated by anonym almost 4 years ago

  • Status changed from Confirmed to In Progress

#7 Updated by anonym almost 4 years ago

  • Status changed from In Progress to Confirmed
  • Assignee deleted (anonym)
  • % Done changed from 0 to 50
  • QA Check set to Ready for QA
  • Feature Branch set to feature/jessie

intrigeri wrote:

I see two of its scenarios fail due to insufficient patterns coverage after filling RAM with a known pattern.

For context, I get:

  Scenario: Anti-test: no memory erasure on an old computer
[...]
    Then I find many patterns in the guest's memory
      Pattern coverage: 59.701% (2118 MiB)
      59.701% of the memory is filled with the pattern, but more than 70.000% was expected.

and

  Scenario: Memory erasure on an old computer
[...]
    When I fill the guest's memory with a known pattern
      Pattern coverage: 60.590% (2149 MiB)
      60.590% of the memory is filled with the pattern, but more than 65.593% was expected.

My guess is that we get this simply because Gnome etc. in Jessie requires more RAM than in Wheezy, and our calculations fail to take this into account, despite attempting to use ratios.

We should first check that we correctly measure this (e.g. I've fixed some of it with 505297c already, and there may be more to update).

Indeed, what we do looks crazy...

If we don't, then fix it. If we do, then assess whether it's expected that on Jessie, less patterns are found.

I think it is a combination of both. I've pushed commit e901c7f which I believe makes a lot more sense than what we did previously, on multiple fronts. I hope the added comments and commit message speaks for itself.

That commit probably would make sense in our wheezy-based branches too, but there these tests work adequately so perhaps we can live with it. Otherwise I guess we can cherry-pick this commit. Thoughts?

#8 Updated by anonym almost 4 years ago

  • Status changed from Confirmed to In Progress
  • Assignee set to intrigeri

#9 Updated by intrigeri almost 4 years ago

  • Assignee changed from intrigeri to anonym
  • % Done changed from 50 to 70
  • QA Check changed from Ready for QA to Dev Needed

anonym wrote:

I think it is a combination of both. I've pushed commit e901c7f which I believe makes a lot more sense than what we did previously, on multiple fronts. I hope the added comments and commit message speaks for itself.

Excellent work! I'm going to nitpick on a lot of things (because that's a complex part of the test suite, and it'd better be extra clear) but I like the general idea a lot. I've run this test suite feature once, and seen it pass.

The reasoning for setting oom_kill_allocating_task to 0 should apply to ocal-includes/usr/share/initramfs-tools/scripts/init-premount/sdmem (where we set it to 1) as well, no? Same question for setting /proc/sys/vm/overcommit_memory to 0 (we set it to 1 in the initramfs hook).

In passing, oom_score_adj is inherited:

# echo $$
29977
# echo -1000 > /proc/$$/oom_score_adj
# cat /proc/$$/oom_score_adj
-1000
# bash
# echo $$
30075
# cat /proc/$$/oom_score_adj
-1000

... so there's a race between the time we start the fillram process (they inherit -1000 from the remote shell) and the time we set their oom_score_adj to 1000, during which the kernel will prefer killing anything but the fillram processes, no? Assuming there's no way to "exec that command with that oom_score_adj", perhaps fillram should wait until its oom_score_adj is 1000, before it starts actually filling memory?

Code style: your love of state (variables ;) shows again with kernel_mem_settings. Generally I would look elsewhere, but I particularly dislike that one, not only because I really don't see the point (I'm used to deal with it, and this time it's debatable, kind of), but also because the variable name doesn't express correctly what it is actually storing (a list of command lines). So if you feel the need for a variable, keep it, but please rename it :)

Did you try lowering the amount of memory reserved for the kernel (64MB) and for admin tasks (128MB)? I'm just wondering if perhaps you set arbitrarily high values when starting working on it, and then forgot to come back to it later.

The commit message reads:

    Now we do what we intended more explicitly, that is: measure the
    pattern coverage in the free RAM at the time we start filling it.

... while the commit does:

-  coverage = patterns_b.to_f/convert_to_bytes(@detected_ram_m.to_f, 'MiB')
-  puts "Pattern coverage: #{"%.3f" % (coverage*100)}% (#{patterns_m} MiB)" 
+  coverage = patterns_b.to_f/@free_mem_after_fill_b
+  puts "Pattern coverage: #{"%.3f" % (coverage*100)}% (#{patterns_m} MiB " +
+       "out of #{free_mem_after_fill_m} MiB initial free memory)" 

At first glance it felt like they're totally contradicting each other (and the new displayed message doesn't reflect reality). A closer look at the code showed me that they are actually not contradicting each other, but rather free_mem_after_fill_m is very badly named. Or did I get it wrong?

Shall we change the "Memory fill progress" string? It's the percentage of used RAM, rather than the percentage of filling we plan to do, right?

That commit probably would make sense in our wheezy-based branches too, but there these tests work adequately so perhaps we can live with it.

Let's focus on Tails 2.0 and not waste time backporting stuff without a good reason.

#10 Updated by anonym almost 4 years ago

  • Assignee changed from anonym to intrigeri
  • QA Check changed from Dev Needed to Ready for QA

intrigeri wrote:

anonym wrote:

I think it is a combination of both. I've pushed commit e901c7f which I believe makes a lot more sense than what we did previously, on multiple fronts. I hope the added comments and commit message speaks for itself.

Excellent work! I'm going to nitpick on a lot of things (because that's a complex part of the test suite, and it'd better be extra clear) but I like the general idea a lot. I've run this test suite feature once, and seen it pass.

Thanks!

The reasoning for setting oom_kill_allocating_task to 0 should apply to ocal-includes/usr/share/initramfs-tools/scripts/init-premount/sdmem (where we set it to 1) as well, no? Same question for setting /proc/sys/vm/overcommit_memory to 0 (we set it to 1 in the initramfs hook).

Sure but note that all the kernel settings we do there are only for the memory filling part, to prevent the VM from freezing while doing it, essentially. All settings we do here are lost when we kexec into the memwipe kernel, after all. What you ask for are unrelated bugfixes in Tails.

In passing, oom_score_adj is inherited:

[...]

... so there's a race between the time we start the fillram process (they inherit -1000 from the remote shell) and the time we set their oom_score_adj to 1000, during which the kernel will prefer killing anything but the fillram processes, no? Assuming there's no way to "exec that command with that oom_score_adj", perhaps fillram should wait until its oom_score_adj is 1000, before it starts actually filling memory?

Indeed. While yours is a clever solution, I think the way I've now implemented this in be2d54f is clearer and more local.

Code style: your love of state (variables ;) shows again with kernel_mem_settings.

The state is irrelevant. It's just an attempt to write easier-to-read code. I greatly prefer introducing variables with a good name over a comment explaining some magic value or how some long chain of functions transform to something usable. It's also a good way to split up code in easier to read blocks, and so it's easier to follow and understand intermediary steps. Feel free to present arguments against this practice, because I intend to keep doing it unless otherwise convinced. :)

Generally I would look elsewhere, but I particularly dislike that one, not only because I really don't see the point (I'm used to deal with it, and this time it's debatable, kind of), but also because the variable name doesn't express correctly what it is actually storing (a list of command lines). So if you feel the need for a variable, keep it, but please rename it :)

I agree that the name is poor in the state you reviewed this. Background: I introduced that variable while experimenting and I originally had a set of default options that I altered a bit when the 64-bit kernel was used, hence the need for the variable to avoid duplication. It turned out that the kernel bitness didn't matter in the end. Furthermore, I had switched to key-value pairs for sysctl, and then the name made a lot more sense. The situation got a bit confusing after I squashed the history when my experiments were done, apparently. I've fixed this in e987ef9.

Just to explain a bit why I think introducing explanatory variables is a good thing, let's look at how this would have worked without that variable:

[
  [key0, val0],
  ...
  [keyn, valn],
].each { |key, val| $vm.execute_successfully("sysctl #{key}=#{val}") }

Since I read from top to down, when I encounter this list, I don't know what it is. I have to jump to the end, where I see what we do with it (sysctl key-value pairs) and then deduce what the list is from what happens in the body (and possibly from the names of the temporary loop variables). A comment could fix it, but context switching from Ruby to English and then back to Ruby just feels unnecessary, and it's not very DRY.

Furthermore, Ruby with it's idiomatic list.each do |element|... convention for loops makes this extra apparent, since in most other languages we'd have for element in list which would help to explain this at the top. In fact, in this case we would then probably be more explanatory like this: for memory_setting, value in ... or something, which makes it clear what the list that follows will be about. But what actually happened here? Yeah, we were forced to introduce variables for the for-loop, and of course we picked good names. That made any comment about what the list is unnecessary.

Another example from the same .rb file:

instances = (@detected_ram_m.to_f/(2**10)).ceil
instances.times do [...]

IMHO this is a lot easier to understand than
(@detected_ram_m.to_f/(2**10)).ceil.times do [...]

although nr_instances admittedly is a better name (44545b2). If we try to save the first approach with a comment like "Run one instance per GiB of code" just to make it clear what the code does, then we break the DRY principle, and we also have a clear indication that the code is not very readable. One shouldn't have to add a comment stating what some code does, only why it's done, like the existing comment for the above (which incidentally contains that sentence, but only as a part of the explanation why).

Perhaps I have just misunderstood you completely, because I find your criticism of my variable usage quite strange. (?)

Did you try lowering the amount of memory reserved for the kernel (64MB) and for admin tasks (128MB)? I'm just wondering if perhaps you set arbitrarily high values when starting working on it, and then forgot to come back to it later.

Yes, these values were picked after hours of testing different combinations. Sorry, I thought I mentioned this in the commit message.

At first glance it felt like they're totally contradicting each other (and the new displayed message doesn't reflect reality). A closer look at the code showed me that they are actually not contradicting each other, but rather free_mem_after_fill_m is very badly named. Or did I get it wrong?

Wow, mind fart! Fixed in 082ea26.

Shall we change the "Memory fill progress" string? It's the percentage of used RAM, rather than the percentage of filling we plan to do, right?

Well, the goal of the memory fill, and indeed what we aim at, is that all RAM should be filled, right?

That commit probably would make sense in our wheezy-based branches too, but there these tests work adequately so perhaps we can live with it.

Let's focus on Tails 2.0 and not waste time backporting stuff without a good reason.

Exactly.

#11 Updated by intrigeri almost 4 years ago

  • Blocked by Bug #8317: fillram isn't effective for filling the memory in Jessie added

#12 Updated by intrigeri almost 4 years ago

  • Related to Bug #10487: Improve VM and OOM settings for erasing memory added

#13 Updated by intrigeri almost 4 years ago

  • Status changed from In Progress to Resolved
  • QA Check changed from Ready for QA to Pass

(Ouch, it seems I managed to annoy/stress you quite a bit on this one. Sorry.)

The reasoning for setting oom_kill_allocating_task to 0 should apply to ocal-includes/usr/share/initramfs-tools/scripts/init-premount/sdmem (where we set it to 1) as well, no? Same question for setting /proc/sys/vm/overcommit_memory to 0 (we set it to 1 in the initramfs hook).

Sure

Thanks for confirming, this allowed me to file #10487.

but note that all the kernel settings we do there are only for the memory filling part, to prevent the VM from freezing while doing it, essentially. All settings we do here are lost when we kexec into the memwipe kernel, after all.

Sure, I got this part :)

What you ask for are unrelated bugfixes in Tails.

To be clear, I did not mean to suggest that you should fix these bugs yourself/now: I merely wanted to know if indeed they were bugs in your opinion. Sorry for being unclear and raising your stress levels needlessly :/

Indeed. While yours is a clever solution, I think the way I've now implemented this in be2d54f is clearer and more local.

Excellent, I like it!

Code style: your love of state (variables ;) shows again with kernel_mem_settings.

The state is irrelevant. It's just an attempt to write easier-to-read code. I greatly prefer introducing variables with a good name over a comment explaining some magic value or how some long chain of functions transform to something usable. It's also a good way to split up code in easier to read blocks, and so it's easier to follow and understand intermediary steps. Feel free to present arguments against this practice, because I intend to keep doing it unless otherwise convinced. :)

ACK wrt. naming things. But that's a matter of style, and I shouldn't have started this debate here (let's keep it for face-to-face pair programming or similar), and should have instead stuck to the narrower buggy variable name issue.

The situation got a bit confusing after I squashed the history when my experiments were done, apparently. I've fixed this in e987ef9.

Excellent, thanks :)

Just to explain a bit why I think introducing explanatory variables is a good thing, [...]

Thanks for the explanation. I agree all these problems need solutions. I just happen to disagree about the single-use-variable solution, but again that's a matter of taste -- FWIW, for such a usecase (that is a variable that's used once), replacing it with a function would address my concerns about state / procedural code, and (I think) solve the same problems as you're introducing variables for. Now, I'm not saying that my own code is as nice (to my own taste) as I'm suggesting yours could/should be, and I'm definitely not authoritative in this area.

Shall we change the "Memory fill progress" string? It's the percentage of used RAM, rather than the percentage of filling we plan to do, right?

Well, the goal of the memory fill, and indeed what we aim at, is that all RAM should be filled, right?

Fair enough.

#14 Updated by intrigeri almost 4 years ago

  • Assignee deleted (intrigeri)
  • % Done changed from 70 to 100

Also available in: Atom PDF