Project

General

Profile

Feature #16041

Replace rotating drives with new SSDs on lizard

Added by intrigeri 2 months ago. Updated 16 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Infrastructure
Target version:
-
Start date:
10/11/2018
Due date:
% Done:

100%

QA Check:
Feature Branch:
Type of work:
Sysadmin
Blueprint:
Starter:
Affected tool:

Description

On the sysadmin side I disabled the old rotating drives:

sudo vgremove spinninglizard
sudo pvremove /dev/mapper/md2_crypt
sudo cryptdisks_stop md2_crypt
sudo mdadm --stop /dev/md2
sudo sed -i --regexp-extended '/^md2_crypt/ d' /etc/crypttab
sudo sed -i --regexp-extended '/^ARRAY \/dev\/md\/2 / d' /etc/mdadm/mdadm.conf
sudo update-initramfs -u

groente, can you please check if I forgot something? Then reassign to me so I handle the next steps.


Related issues

Related to Tails - Bug #16131: Broken Samsung SSD 850 EVO 1TB on lizard Resolved 11/17/2018
Related to Tails - Bug #16161: optimise pv placement for io-performance In Progress 11/28/2018
Blocks Tails - Feature #13242: Core work 2017Q4 → 2019Q2: Sysadmin (Maintain our already existing services) Confirmed 06/29/2017
Blocks Tails - Bug #16155: increase jenkins and iso-archive diskspace Resolved 11/27/2018

History

#2 Updated by intrigeri 2 months ago

  • Blocks Feature #13242: Core work 2017Q4 → 2019Q2: Sysadmin (Maintain our already existing services) added

#3 Updated by groente 2 months ago

  • Assignee changed from groente to intrigeri
  • QA Check changed from Ready for QA to Dev Needed

intrigeri wrote:

On the sysadmin side I disabled the old rotating drives:

[...]

groente, can you please check if I forgot something? Then reassign to me so I handle the next steps.

Apart from the systemd services that tried to bring md2_crypt back up again I already mentioned on xmpp, I think that pretty much covers it.

Just to be safe, I would recommend running grub-install again on the remaining disks (sda -- sdf), it should already be there, but with the occassional 'grub not found' during lizard reboots, it's better to be safe than sorry before pulling disks out.

#4 Updated by intrigeri 2 months ago

  • Target version changed from Tails_3.10.1 to Tails_3.11

#5 Updated by intrigeri about 2 months ago

groente wrote:

Apart from the systemd services that tried to bring md2_crypt back up again I already mentioned on xmpp, I think that pretty much covers it.

FTR that was fixed.

Just to be safe, I would recommend running grub-install again on the remaining disks (sda -- sdf), it should already be there, but with the occassional 'grub not found' during lizard reboots, it's better to be safe than sorry before pulling disks out.

Good idea! I did sudo dpkg-reconfigure grub-pc, selected /dev/sd[c-f] and let it install GRUB on those drives. Note that we can't install GRUB on /dev/sd[ab] because there's simply no room for it (fully encrypted, no partition table nor filesystem that GRUB can use).

#6 Updated by intrigeri about 1 month ago

  • Status changed from Confirmed to In Progress

#7 Updated by intrigeri 27 days ago

  • Related to Bug #16131: Broken Samsung SSD 850 EVO 1TB on lizard added

#8 Updated by intrigeri 27 days ago

Our BIOS was still configured to start on the rotating drives. I've fixed that.

Pinged taggart on IRC today.

#9 Updated by groente 26 days ago

due to md1 being degraded (see #16131), the following LV's will be moved from md1 to md4:

    root
    puppet-git-system   *
    apt-system
    apt-data
    rsync-system
    bittorrent-system
    apt-proxy-system
    apt-proxy-data
    whisperback-system
    bitcoin-data        **
    jenkins-system
    bridge-system
    www-system
    misc-system
    puppet-git-data
    bitcoin-system
    bitcoin-swap
    isos-www            **
    isotester1-system
    im-system
    monitor-system
    isotester2-system
    isotester3-system
    isotester4-system
    isotester4-data
    apt-snapshots       **
    isotester5-system
    isotester5-data
    isotester6-system
    isotester6-data
    translate-system    **
    isobuilder1-system
    isobuilder4-system
    isobuilder3-system
    isobuilder3-data    **
    isobuilder2-system 
    isobuilder2-libvirt **
    isobuilder3-libvirt **
    isobuilder4-libvirt **
    isobuilder1-libvirt **
    apt-proxy-swap

LV's marked * also have a foot in md3, only the parts from md1 will be moved
LV's marked ** were already partially on md4

#10 Updated by intrigeri 23 days ago

  • Assignee changed from intrigeri to bertagaz
  • % Done changed from 0 to 10

Old drives pulled out, new drives plugged in. Please do the basic setup of the new drives (or ask me to do it) and reassign to me so I do the next steps. See ML for required timing & technical details. Thanks in advance!

#11 Updated by groente 17 days ago

  • Assignee changed from bertagaz to groente

stealing this ticket because we need the diskspace for the sprint

#12 Updated by groente 17 days ago

  • Blocks Bug #16155: increase jenkins and iso-archive diskspace added

#13 Updated by intrigeri 16 days ago

Regarding spreading the I/O load again accross PVs aka. RAID arrays:

  • this much seems obvious: spread the ISO builders & testers over at least 2 arrays; they don't use that much I/O though (we've set up stuff & memory to minimize I/O needs here)
  • top IOPS consumers (average IOPS over a week, max of read & write): jenkins-data (147.91), apt-snapshots (36.66), translate-system (10.90), apt-proxy-data (7.44), puppet-git-system (6.81), isos (4.86), bitcoin-data (3.80)
  • ISO builders & testers, when busy, make other volumes busy (mainly jenkins-data, apt-snapshots, apt-proxy-data); let's separate them if we can

So let's try this:

  • md3 (old, 500GB): translate-system, apt-proxy-data, puppet-git-system, bitcoin-data, half of Jenkins workers (isobuilders 1-2, isotesters 1-3)
  • md4 (old, 2TB): jenkins-data, isos, 1/4 of ISO builders & testers (isobuilder3, isotester4)
  • md5 (new, 4TB): apt-snapshots, 1/4 of Jenkins workers (isobuilder4, isotesters 5-6)

I'll do this once the lower part of the stack is ready and a week or two later I'll check latency and IOPS per PV, which should tell me how good or bad this first iteration was.

#14 Updated by groente 16 days ago

  • Assignee changed from groente to intrigeri
  • QA Check changed from Dev Needed to Pass

go for it, once that's done i think this ticket can be closed \o/

#15 Updated by intrigeri 16 days ago

  • % Done changed from 10 to 50
  • QA Check deleted (Pass)

#16 Updated by intrigeri 16 days ago

Amending the plan: md3 would be too full if we do exactly that so I'll move isobuilder2 stuff to md5 instead.

#17 Updated by groente 16 days ago

  • Related to Bug #16161: optimise pv placement for io-performance added

#18 Updated by groente 16 days ago

  • Status changed from In Progress to Resolved
  • Target version deleted (Tails_3.11)
  • % Done changed from 50 to 100

all done with the disk replacement, created a new ticket for the pv-switcheroo

Also available in: Atom PDF