2023-09-25

[Technology] docker.io/library/node container image size

As seen in my last post, the node image on docker is over 1GB. I was curious what contributed to that, so I ran and attached myself to a shell in it.

$ podman run -dt node bash
$ podman attach -l
root@...:/#

A quick query of installed packages and their sizes gave me these top 30 packages:

dpkg-query -Wf '${Installed-Size}\t${Package}\n' | sort -nr | head -n 30

Size (kB)
Deb Package
% (of installed packages)
68,236gcc-127.65
47,784libicu-dev5.36
44,477git4.99
36,170libicu724.05
36,161g++-124.05
33,848cpp-123.79
28,862libperl5.363.23
19,446libstdc++-12-dev2.18
18,062coreutils2.02
17,816perl-modules-5.361.99
16,203libx265-1991.81
15,021binutils-common1.68
14,577mercurial-common1.63
14,284libgcc-12-dev1.60
12,986libc61.45
12,303libssl-dev1.38
11,957libc6-dev1.34
11,428binutils-x86-64-linux-gnu1.28
10,909librsvg2-21.22
10,076sq1.13
10,075libglib2.0-dev1.13
9,404libglib2.0-data1.05
8,323libpython3.11-stdlib0.93
8,130libmagic-mgc0.91
8,017libasan80.89
7,815libtsan20.87
7,638perl-base0.85
7,164bash0.80
6,761python3.11-minimal0.75
6,589linux-libc-dev0.73

Surprisingly, no packages for nodejs/npm are listed! I calculated the sum of the size of all installed packages, and got 891,320 kB, but overall disk usage in the image's final file system is 1,139,008 kB. That's a discrepancy of 247,688 kB! Poking around to find some evidence of it, I see that npm is installed at /usr/local/bin/npm. All of /usr/local takes up 167,728 kB in the final image. Let's poke around and see which layer added that.

$ podman image tree node
Image ID: 386e0be86bde
Tags:     [docker.io/library/node:latest]
Size:     1.122GB
Image Layers
├── ID: 7c85cfa30cb1 Size: 121.3MB
├── ID: a9af9831483f Size: 49.56MB
├── ID: 68731f6c1e1e Size: 181.4MB
├── ID: d396cdbdc3ad Size: 596.9MB
├── ID: 02f1f39c8ec9 Size: 22.53kB
├── ID: 4709b21dad1d Size: 164.8MB
├── ID: 24d9598c79a8 Size: 7.626MB
└── ID: 19ba6ee0c874 Size: 3.584kB Top Layer of: [docker.io/library/node:latest]

$ podman history node
ID            CREATED     CREATED BY                                                                   SIZE              COMMENT
386e0be86bde  4 days ago  /bin/sh -c #(nop)  CMD ["node"]                                              0B                
<missing>     4 days ago  /bin/sh -c #(nop)  ENTRYPOINT ["docker-ent...                                0B                
<missing>     4 days ago  /bin/sh -c #(nop) COPY file:4d192565a7220e...                                3.58kB            
<missing>     4 days ago  /bin/sh -c set -ex   && for key in     6A0...                                7.63MB            
<missing>     4 days ago  /bin/sh -c #(nop)  ENV YARN_VERSION=1.22.19                                  0B                
<missing>     4 days ago  /bin/sh -c ARCH= && dpkgArch="$(dpkg --pri...                                165MB             
<missing>     4 days ago  /bin/sh -c #(nop)  ENV NODE_VERSION=20.7.0                                   0B                
<missing>     4 days ago  /bin/sh -c groupadd --gid 1000 node   && u...                                22.5kB            
<missing>     5 days ago  /bin/sh -c set -ex;                            apt-get update;   apt-...     597MB       
<missing>     5 days ago  /bin/sh -c apt-get update && apt-get insta...                                181MB             
<missing>     5 days ago  /bin/sh -c set -eux;                           apt-get update;   apt...      49.6MB      
<missing>     5 days ago  /bin/sh -c #(nop)  CMD ["bash"]                                              0B                
<missing>     5 days ago  /bin/sh -c #(nop) ADD file:ce04d6a354feaef...                                0B    

Taking a peak in ~/.local/share/containers/storage/overlay and doing a find against each of the layer IDs, looking for the npm binary, I get this:

./4709b21dad1d5d5cb88803c9c598c81fd9a94de38398ad002ec569cff33445c7/diff/usr/local/bin/npm
./4709b21dad1d5d5cb88803c9c598c81fd9a94de38398ad002ec569cff33445c7/diff/usr/local/lib/node_modules/corepack/shims/nodewin/npm
./4709b21dad1d5d5cb88803c9c598c81fd9a94de38398ad002ec569cff33445c7/diff/usr/local/lib/node_modules/corepack/shims/npm
./4709b21dad1d5d5cb88803c9c598c81fd9a94de38398ad002ec569cff33445c7/diff/usr/local/lib/node_modules/npm
./4709b21dad1d5d5cb88803c9c598c81fd9a94de38398ad002ec569cff33445c7/diff/usr/local/lib/node_modules/npm/bin/npm

Peaking at its directory structure, I see that that layer is 167,712 kB and accounts for all of /usr/local, and its entirely node files. Some of the most notable disk usage is as follows:

Size (kB)
Path
Comment
93,524usr/local/bin/nodeThe binary itself.
53,004usr/local/include/node/openssl/Most of this in its archs/ subdirectory, with 21 architectures' .h files.
19,208usr/local/lib/node_modules/
17,456usr/local/lib/node_modules/npm
14,308usr/local/lib/node_modules/npm/node_modulesContains 196 modules, the largest as node-gyp at 3716 kB.

So, there you have it. The node image is so large mostly because it contains a tonne of development .deb packages, and node-specific files take up ~164 MB, about 14% of the final image.

2023-09-24

[Technology] Galton Boards and OCI Containers


A friendly recently showed me a Galton Board.  Think binomial distributions and maybe even Plinko.  (Wiki, and a simulation by Wolfgang Christian.)

I ended up implementing one of my own using TypeScript and as an excuse to play more with podman (an OCI container manager like Docker).

You can find the source code on gitlab and run it there, too: https://aquarichy.gitlab.io/kosmo-galton

Here are some random notes from the effort:

  • Gitlab pipelines for to deploy pages only work on the default branch.  I tried to work on my .gitlab-ci.yml file in a side branch and that didn't work.   Oops.
  • I had a path hard-coded in my Makefile that I didn't really want to expose, so I had some in git fun branching to its commit, editing, rebasing, and then clearing my reflog and doing some gc. :D
  • Since typescript is available for npm, using the node:lts image that gitlab likes to suggest is great.  Just need to do "npm install -g typescript" first.
  • Podman images: I played with a variety of Containerfile approaches to look at resulting image size.  See below.

Containers

I played around with single and two-stage Containerfiles.  The point of two stage was to have typescript's tsc available to compile the code within a container, without needing to carry nodejs in the final image.  In general I used Apache's httpd.  

I appreciate that images are stored as layers and common layers are reused, so making use of, say, a 1GB container image (node) in 3 images of my own doesn't result in 3GB of additional disk space used.  Still, I find it valuable to be conscious of space and memory usage, and minimizing unnecessary files when preparing and deploying software.

Also, while I may "save space" by using smaller images, in this case, that results in extra network I/O and CPU time as I now end up needing to use dnf/microdnf/apk/npm to install nodejs/npm/typescript (depending on combination).

Single-stage:

  1. httpd: Pre-transcompile .ts into .js outside of the image; then use docker's httpd image and just add my HTML/JS/CSS files to it.

Two stage (first compile ts; second deploy in Apache)

  1. fedora-minimal>httpd: First stage, start with the fedora-minimal image from registry.fedoraproject.org, install its typescript using microdnf (pulls in node), transcompile .ts; second stage, start with httpd again and copy over transcompiled .js, and add HTML/CSS from outside of the image.
  2. node>httpd: First stage, start with node image from docker, install typescript using npm, add and transcompile .ts; second stage like #2.
  3. alpine>httpd: First stage, start with tiny alpine image from docker, add npm using apk, then install typescript with npm.  The rest like #2 and #3.
  4. alpine>alpine: First stage, same as #4 using alpine; stage two, start with alpine again, add apache2 with apk, copy in HTML/CSS/JS as before, and add CMD to store httpd in the foreground (warning: dumb, simple configuration; not appropriate for real usage as is) 

For base image sizes:

  • docker.io/library/alpine: 7.63MB
  • registry.fedoraproject.org/library/fedora-minimal: 97.8MB
  • registry.fedoraproject.org/library/fedora: 196MB
  • docker.io/library/httpd: 173MB
  • docker.io/library/node: 1.12GB

 For my intermediary and final images, I end up with:

Approach
Stage
Image description
Modified image size
Base image size
Note
1(n/a)httpd173 MB173 MBNo notable change!
21fedora-minimal + nodejs, typescript332 MB97 MBnodejs + typescript add a lot
31node + typescript1.18 GB1.12 GBnode image starts off big
41alpine + npm, typescript133 MB7.63 MBnodejs, npm and typescript adding a lot again
52alpine + httpd13.5 MB7.63 MBso tiny vs. the 173 MB httpd image!

This makes it worthwhile to tag and have a container with TypeScript set-up already. :)

2023-09-05

[Technology] systemd timers and service files in place of cron

I keep a daily journal in an Emacs org-mode file.  Sometimes due to clumsy keypresses in Emacs, I have temporarily deleted (!) lines or sections (!).  I think I have always caught these mistakes, but to be safe, I decided to create a git repository for it and do a daily commit.

In the past, I'd have used a cron job, but in recent years I've tried to make more and more use of systemd and timers.  Is one better than the other?  I'm going to ignore that question and proceed. :D

For this task, which I'm calling "log_git_committer" (very uninspired), I have four files:

  • log_git_committer.sh - a Bash script that handles actually committing changes
  • log_git_committer.service - a systemd service file that references the .sh file
  • log_git_committer.timer - a systemd timer file that implicitly references .service unit and describes the interval
  • Makefile - a simple make file to install and uninstall my files as I work on it

Below are some of the more interesting notes from working on it, and the systemd files. 

Notes

Some notable pieces that came up:

  • .service:
    • Type=oneshot: since I'm running this from a timer, I just want this service unit to be a "oneshot", as it's not actually a service that would keep running in the background.
    • ExecStart= and %h: I'm installing the shell script locall in my ~/.local/bin directory.  ExecStart in a .service won't accept ~/ or $HOME/ in a path, preferring absolute paths.  However, systemd has some specifiers like %h that will substitute in a user's path when a service unit is run with --user. 
  • .timer:
    • OnCalendar=: it has a lot of options that are similar to cron.  Neat. 
    • retroactively running missed timers?  Yes, if I set a timer to run at 3AM but I put my computer to sleep at 1AM and turn it back on at 8AM, the timer will then execute.  If I end up in Narnia and my computer is asleep for 3 days while I'm away (a lifetime there), it will then run the service just once to catch up.  Nice.
  • Makefile:
    • unit file syntax checking: systemd provides a command that lets you analyze your unit files to check for correctness. E.g.
      systemd-analyze verify log_git_committer.timer
      systemd-analyze verify log_git_committer.service

      so I added that to a target.
    • systemd install directory: for non-root user services and timers, the install directory is ~/.config/systemd/user/
      Some other locations on my Fedora 38 system for systemd unit files include:
      • /usr/lib/systemd/user/
      • /usr/lib/systemd/service/
      • /etc/systemd/user/
      • /etc/systemd/system/
    • systemd and reloading updated .service files: if I make a change to a .service file, systemd would like me to reload the new one from disk.  So, in my Makefile, after I copy .service and .timer files into their install directory, I call:
      systemctl --user daemon-reload

log_git_committer.sh

If there's interest, I can share this, but it's pretty simple and straight forward.

log_git_committer.service

[Unit]
Description=Commit changes to a log file

[Service]
Type=oneshot
ExecStart=%h/.local/bin/log_git_committer.sh

[Install]
WantedBy=multi-user.target

 log_git_committer.timer

[Unit]
Description=Do a daily commit of the log file.

[Timer]
OnCalendar=3:00:00
Persistent=true

[Install]
WantedBy=timers.target

 Resources

 

[Technology] web development: TypeScript!

I used to be a snob about various technologies.  Two decades ago, I hated doing web development with JavaScript, dealing with browser quirks and what felt like deficiencies in the base standard and browser client libraries.  It felt like you couldn't write simple, safe and predictable code, especially without relying on a third party framework or library.

However, much has changed and I genuinely enjoy it now.  I still often prefer to write code without third-party libraries or frameworks if I can afford to, but I have increasingly come to rely on TypeScript.  I value type checking a lot, especially as projects grow and APIs evolve.  Managing types explicitly entails more work earlier on, but it also helps spare me from silly bugs and saves me time during refactoring or when revisiting code I wrote a while ago.  

TypeScript works by transcompiling TypeScript (in .ts files) into pure JavaScript.  Some of my favourite features include:

  • getting to set language standard targets in configuration.  E.g. setting your target to "es2016" will mean the generated JavaScript will down-compile newer syntax into syntax that exists in ECMAScript 2016.  (Note: this doesn't polyfill API functionality!)
  • easily defining interface types for simple objects with fixed property names

There's some effort to promote a native static type system in future ECMAScript, like this proposal: https://github.com/tc39/proposal-type-annotations.  I hope for this friendlier future.

2023-09-03

[Technology] firejail: sandboxing with Linux namespaces

I am generally fairly wary of the amount of access software has on my computer.  Consequently, I like to use firejail on Linux to sandbox a lot of applications.  E.g. am I playing a single-player game from itch.io?  It doesn't need access to my mount points, my home directory (beyond the game's own directory for the program, game files and save data) or to the network.  Sometimes I am stunned by how much trust I put into random software back in 1998

It uses Linux namespace and seccomp-bfp.

An example command of what I might use would be:

$ firejail --net=none --disable-mnt --whitelist=/home/myuser/files/games/game_title/ ./game.sh
Some common programs have pre-existing profiles defined by firejail, like firefox, and those can be found in /etc/firejail.  In the case of firefox, one notable change is access to your home directory: it gets restricted to just your downloads folder!

Dieses Blog durchsuchen

Labels

#Technology #GNOME gnome gxml fedora bugs linux vala google #General firefox security gsoc GUADEC android bug xml fedora 18 javascript libxml2 programming web blogger encryption fedora 17 gdom git emacs libgdata memory mozilla open source serialisation upgrade web development API Spain containers design evolution fedora 16 fedora 20 fedora 22 fedup file systems friends future glib gnome shell internet luks music performance phone photos php podman preupgrade tablet testing typescript yum #Microblog Network Manager adb apache art automation bash brno catastrophe css data loss debian debugging deja-dup disaster docker emusic errors ext4 facebook fedora 19 gee gir gitlab gitorious gmail gobject google talk google+ gtk html libxml mail microsoft mtp mysql namespaces nautilus nextcloud owncloud picasaweb pitivi ptp python raspberry pi resizing rpm school selinux signal sms speech dispatcher systemd technology texting time management uoguelph usability video web design youtube #Tech Air Canada C Electron Element Empathy Europe GError GNOME 3 GNOME Files Go Google Play Music Grimes IRC Mac OS X Mario Kart Memento Nintendo Nintendo Switch PEAP Selenium Splatoon UI VPN Xiki accessibility advertising ai albums anaconda anonymity apple ask asus eee top automake autonomous automobiles b43 backup battery berlin bit rot broadcom browsers browsing canada canadian english cars chrome clarity comments communication compiler complaints computer computers configuration console constructive criticism cron cropping customisation dataloss dconf debug symbols design patterns desktop summit development discoverability distribution diy dnf documentation drm duplicity e-mail efficiency email english environment estate experimenting ext3 fedora 11 festival file formats firejail flac flatpak forgottotagit freedom friendship fuse galaxy nexus galton gay rights gdb german germany gimp gio gjs gnome software gnome-control-center google assistant google calendar google chrome google hangouts google reader gqe graphviz growth gtest gtg gvfs gvfs metadata hard drive hard drives hardware help hp humour ide identity instagram installation instant messaging integration intel interactivity introspection jabber java java 13 jobs kernel keyboard language language servers languages law learning lenovo letsencrypt libreoffice librpm life livecd liveusb login lsp macbook maintainership mariadb mario matrix memory leaks messaging mounting mouse netflix new zealand node nodelist numix obama oci ogg oggenc oh the humanity open open standards openoffice optimisation org-mode organisation package management packagekit paint shedding parallelism pdo perl pipelight privacy productivity progress progressive web apps pumpkin pwa pyright quality recursion redhat refactoring repairs report rhythmbox rust sandboxes scheduling screenshots self-navigating car shell sleep smartphones software software engineering speed sql ssd synergy tabs test tests themes thesis tracker travel triumf turtles tv tweak twist typing university update usb user experience valadoc video editing volunteering vpnc waf warm wayland weather web apps website wifi wiki wireless wishes work xinput xmpp xorg xpath
Powered by Blogger.