2015-01-22

[Technology] Bitrotting brain

TL;DR


  • have an encrypted /home

  • hard shut down of computer

  • wouldn't boot back up to the login screen

  • couldn't mount /home from emergency mode

  • had to boot a Live USB key and do fsck.ext4 on / (yes, /)

  • problems with mounting /home were probably more to do with emergency mode and what gets enabled for different targets via systemd

  • I need to update myself on my system's work-abouts :D


Premise



I know a lot about my computer and computers.  There are a lot of people who know a lot more than me.  Relative to the rest of the population, there was a time 3.5 years ago when my knowledge was probably ranked at a higher percentile.



I haven't had the same time and motivation these past few years, though.  One of the largest complications has been my Masters.  Sure, I know a lot more about machine learning, linguistics, natural language processing, etc., but that hasn't helped me really understand my computer much more.  I was fortunate enough to participate in the Google Summer of Code 3 summers in a row, but my time was always tight for that, and full-use was probably not made of those opportunities working on GNOME.



However, today was a harsh wake-up call about the consequences of idleness.  My knowledge of init and rc and Linux's old boot sequence is still fairly strong, having grown up in Slackware on the command-line.  However, while I've read about evolving configurations and utilities since, I haven't gotten much practise in them. 



And so, it was with horror when I realised that my system would not fully boot today and I could not immediately address the issue.



Today's Predicament



My /home directory is an ext4 filesystem on an encrypted LUKS partition.  Perhaps you can immediately see what is going to go wrong.  Earlier today, my computer froze up.  That's not uncommon.  As much as I love Linux, I am never surprised when it spontaneously freezes (especially on a laptop (what with having its Internet connection and video mode and power supply changing constantly)).  After trying to get it to respond for 15s, I just held down the power button as usual and then turned it back on.



Plymouth came up, I saw the fedora logo start to fill, it was interrupted by the password prompt for my /home partition, I entered the password, and then the fedora logo filled up and - nothing.  It just waited there.  It should switch to gdm's login screen, but it wouldn't.  I could go see the terminal in the background, but I couldn't access any shells.  Just the start up log.  There was nothing obviously wrong.



Sometimes I would see this message, "EXT4-fs error (device dm-0): ext4_mb_generate_buddy:757: group 83, block bitmap and bg descriptor inconsistent: 21682 vs 21673 free clusers", but dm-0 is my root directory.  dm-3 is my home directory.



I tried to boot into rescue mode to see if I could still access all my files.  No.  After supplying my password for decryption, I would see this error message "device-mapper: table: 253:3: crypt: unknown target type

Failed to activate: Invalid argument".



That sounds bad.  I entered the emergency mode shell, and checked the journal.  This is something I only do occasionally using journalctl, as I used to directly check logs or dmesg.  Uh oh.  Here it was failing to mount /home, and it was failing at the step of "/usr/lib/systemd/systemd-cryptsetup attach luks-1496...e527 none" [LUKS name contracted].  Uh oh.



At this point I started to suspect that my encrypted partition was corrupted from my hard shut-down.  There's not much else I can do with my computer when it freezes up, but I haven't suffered from it being left in a volatile, at least until now.



I tried to manually mount it, but then realised I have almost never manually mounted a file system encrypted using LUKS.  I've used encfs, but - uh, now what?  Also, lvm2 and systemd I'm only passingly familiar with.  How am I supposed to be confident in my computer's reliability if I don't even understand the fundamental tools that house my data any more?



I double-checked that my back-up was still working (I've had back-up hard drives die on me before!) using another old computer and thankfully it was OK.  However, I didn't want to have to back-up from it if I didn't have to, as I had done work in the past 24 hours that I didn't want to lose, damn it.  (I'm so glad that I've gotten the knack of backing up every 24 hours at least!).  I also wasn't completely convinced that my data was inaccessible.  On plain text file systems, it's easy to grep or scan to reclaim many/most files, even if the file system itself is corrupted or the partition boundaries have been lost.  You can't readily do that when the file system is encrypted, but perhaps there was a way to repair it?



Also, this made me wonder, why was the default multi-user graphical mode stalling out?  If it couldn't mount /home, it should still be able to display the login screen.  I don't need a /home for that.  I tried to launch gdm from emergency mode and couldn't because of failed dependencies, one of which was for binary formats.  That seemed weird.  I didn't remember so many things failing to start back when I was booting in the normal way.  So I tried the normal way again and ... it claimed that /home mounted cleanly and even gave a plausible count of files.  (!)



I then decided to boot in single-user mode (edit the grub command for the kernel I'm booting into and add 'single' on the kernel line).  This took me through the same process as usual, which seemed to mount /home correctly, but instead of going all the way to the point where it was stalling, it stops successfully before at a command-line for root.  Hooray!



Once there, I poked around.  I couldn't immediately tell why from emergency mode I couldn't successfully mount my file systems, but there's an egregious lack of familiarity in me regarding how systemd, lvm2, and luks interact, so that's no big surprise.  I did a passive run of fsck.ext4 on my / directory (remember how dm-0 had errors reported earlier?) and yes, there seemed to be quite a lot.  I grabbed my Fedora live USB key from my drawer, re-booted into it, ran fsck on my computer's root partition, and let it fix all the errors.  This always alarms me, as I never know what data has been lost on my file system to lead to the errors in question.  There's never a guarantee that cleaning up after the errors will resolve the problem in question.



Anyway, I rebooted, I went through a normal boot and - tada, the login screen.



Conclusion



So why did I write all that?  In part to motivate me to learn more about the current state of my system to regain the glory of self-sufficiency.  And in part so the few error messages I encountered that another in my case might encounter will be documented, and searchable. :D

Keine Kommentare:

Kommentar veröffentlichen

Dieses Blog durchsuchen

Labels

#Technology #GNOME gnome gxml fedora bugs linux vala google #General firefox security gsoc GUADEC android bug xml fedora 18 javascript libxml2 programming web blogger encryption fedora 17 gdom git emacs libgdata memory mozilla open source serialisation upgrade web development API Spain containers design evolution fedora 16 fedora 20 fedora 22 fedup file systems friends future glib gnome shell internet luks music performance phone photos php podman preupgrade tablet testing typescript yum #Microblog Network Manager adb apache art automation bash brno catastrophe css data loss debian debugging deja-dup disaster docker emusic errors ext4 facebook fedora 19 gee gir gitlab gitorious gmail gobject google talk google+ gtk html libxml mail microsoft mtp mysql namespaces nautilus nextcloud owncloud picasaweb pitivi ptp python raspberry pi resizing rpm school selinux signal sms speech dispatcher systemd technology texting time management uoguelph usability video web design youtube #Tech Air Canada C Electron Element Empathy Europe GError GNOME 3 GNOME Files Go Google Play Music Grimes IRC Mac OS X Mario Kart Memento Nintendo Nintendo Switch PEAP Selenium Splatoon UI VPN Xiki accessibility advertising ai albums anaconda anonymity apple ask asus eee top automake autonomous automobiles b43 backup battery berlin bit rot broadcom browsers browsing canada canadian english cars chrome clarity comments communication compiler complaints computer computers configuration console constructive criticism cron cropping customisation dataloss dconf debug symbols design patterns desktop summit development discoverability distribution diy dnf documentation drm duplicity e-mail efficiency email english environment estate experimenting ext3 fedora 11 festival file formats firejail flac flatpak forgottotagit freedom friendship fuse galaxy nexus galton gay rights gdb german germany gimp gio gjs gnome software gnome-control-center google assistant google calendar google chrome google hangouts google reader gqe graphviz growth gtest gtg gvfs gvfs metadata hard drive hard drives hardware help hp humour ide identity instagram installation instant messaging integration intel interactivity introspection jabber java java 13 jobs kernel keyboard language language servers languages law learning lenovo letsencrypt libreoffice librpm life livecd liveusb login lsp macbook maintainership mariadb mario matrix memory leaks messaging mounting mouse netflix new zealand node nodelist numix obama oci ogg oggenc oh the humanity open open standards openoffice optimisation org-mode organisation package management packagekit paint shedding parallelism pdo perl pipelight privacy productivity progress progressive web apps pumpkin pwa pyright quality recursion redhat refactoring repairs report rhythmbox rust sandboxes scheduling screenshots self-navigating car shell sleep smartphones software software engineering speed sql ssd synergy tabs test tests themes thesis tracker travel triumf turtles tv tweak twist typing university update usb user experience valadoc video editing volunteering vpnc waf warm wayland weather web apps website wifi wiki wireless wishes work xinput xmpp xorg xpath
Powered by Blogger.