.plan: September 2023

2023-09-25

[Technology] docker.io/library/node container image size

Posted by Richard um 13:42
Labels: #Technology, containers, debian, docker, node, podman

As seen in my last post, the node image on docker is over 1GB. I was curious what contributed to that, so I ran and attached myself to a shell in it.

$ podman run -dt node bash
$ podman attach -l
root@...:/#

A quick query of installed packages and their sizes gave me these top 30 packages:

dpkg-query -Wf '${Installed-Size}\t${Package}\n' | sort -nr | head -n 30

Size (kB)	Deb Package	% (of installed packages)
68,236	gcc-12	7.65
47,784	libicu-dev	5.36
44,477	git	4.99
36,170	libicu72	4.05
36,161	g++-12	4.05
33,848	cpp-12	3.79
28,862	libperl5.36	3.23
19,446	libstdc++-12-dev	2.18
18,062	coreutils	2.02
17,816	perl-modules-5.36	1.99
16,203	libx265-199	1.81
15,021	binutils-common	1.68
14,577	mercurial-common	1.63
14,284	libgcc-12-dev	1.60
12,986	libc6	1.45
12,303	libssl-dev	1.38
11,957	libc6-dev	1.34
11,428	binutils-x86-64-linux-gnu	1.28
10,909	librsvg2-2	1.22
10,076	sq	1.13
10,075	libglib2.0-dev	1.13
9,404	libglib2.0-data	1.05
8,323	libpython3.11-stdlib	0.93
8,130	libmagic-mgc	0.91
8,017	libasan8	0.89
7,815	libtsan2	0.87
7,638	perl-base	0.85
7,164	bash	0.80
6,761	python3.11-minimal	0.75
6,589	linux-libc-dev	0.73

Surprisingly, no packages for nodejs/npm are listed! I calculated the sum of the size of all installed packages, and got 891,320 kB, but overall disk usage in the image's final file system is 1,139,008 kB. That's a discrepancy of 247,688 kB! Poking around to find some evidence of it, I see that npm is installed at /usr/local/bin/npm. All of /usr/local takes up 167,728 kB in the final image. Let's poke around and see which layer added that.

$ podman image tree node
Image ID: 386e0be86bde
Tags:     [docker.io/library/node:latest]
Size:     1.122GB
Image Layers
├── ID: 7c85cfa30cb1 Size: 121.3MB
├── ID: a9af9831483f Size: 49.56MB
├── ID: 68731f6c1e1e Size: 181.4MB
├── ID: d396cdbdc3ad Size: 596.9MB
├── ID: 02f1f39c8ec9 Size: 22.53kB
├── ID: 4709b21dad1d Size: 164.8MB
├── ID: 24d9598c79a8 Size: 7.626MB
└── ID: 19ba6ee0c874 Size: 3.584kB Top Layer of: [docker.io/library/node:latest]

$ podman history node
ID            CREATED     CREATED BY                                                                   SIZE              COMMENT
386e0be86bde  4 days ago  /bin/sh -c #(nop)  CMD ["node"]                                              0B                
<missing>     4 days ago  /bin/sh -c #(nop)  ENTRYPOINT ["docker-ent...                                0B                
<missing>     4 days ago  /bin/sh -c #(nop) COPY file:4d192565a7220e...                                3.58kB            
<missing>     4 days ago  /bin/sh -c set -ex   && for key in     6A0...                                7.63MB            
<missing>     4 days ago  /bin/sh -c #(nop)  ENV YARN_VERSION=1.22.19                                  0B                
<missing>     4 days ago  /bin/sh -c ARCH= && dpkgArch="$(dpkg --pri...                                165MB             
<missing>     4 days ago  /bin/sh -c #(nop)  ENV NODE_VERSION=20.7.0                                   0B                
<missing>     4 days ago  /bin/sh -c groupadd --gid 1000 node   && u...                                22.5kB            
<missing>     5 days ago  /bin/sh -c set -ex;                            apt-get update;   apt-...     597MB       
<missing>     5 days ago  /bin/sh -c apt-get update && apt-get insta...                                181MB             
<missing>     5 days ago  /bin/sh -c set -eux;                           apt-get update;   apt...      49.6MB      
<missing>     5 days ago  /bin/sh -c #(nop)  CMD ["bash"]                                              0B                
<missing>     5 days ago  /bin/sh -c #(nop) ADD file:ce04d6a354feaef...                                0B

Taking a peak in ~/.local/share/containers/storage/overlay and doing a find against each of the layer IDs, looking for the npm binary, I get this:

./4709b21dad1d5d5cb88803c9c598c81fd9a94de38398ad002ec569cff33445c7/diff/usr/local/bin/npm
./4709b21dad1d5d5cb88803c9c598c81fd9a94de38398ad002ec569cff33445c7/diff/usr/local/lib/node_modules/corepack/shims/nodewin/npm
./4709b21dad1d5d5cb88803c9c598c81fd9a94de38398ad002ec569cff33445c7/diff/usr/local/lib/node_modules/corepack/shims/npm
./4709b21dad1d5d5cb88803c9c598c81fd9a94de38398ad002ec569cff33445c7/diff/usr/local/lib/node_modules/npm
./4709b21dad1d5d5cb88803c9c598c81fd9a94de38398ad002ec569cff33445c7/diff/usr/local/lib/node_modules/npm/bin/npm

Peaking at its directory structure, I see that that layer is 167,712 kB and accounts for all of /usr/local, and its entirely node files. Some of the most notable disk usage is as follows:

Size (kB)	Path	Comment
93,524	usr/local/bin/node	The binary itself.
53,004	usr/local/include/node/openssl/	Most of this in its archs/ subdirectory, with 21 architectures' .h files.
19,208	usr/local/lib/node_modules/
17,456	usr/local/lib/node_modules/npm
14,308	usr/local/lib/node_modules/npm/node_modules	Contains 196 modules, the largest as node-gyp at 3716 kB.

So, there you have it. The node image is so large mostly because it contains a tonne of development .deb packages, and node-specific files take up ~164 MB, about 14% of the final image.

2023-09-24

[Technology] Galton Boards and OCI Containers

Posted by Richard um 15:07
Labels: #Technology, containers, galton, git, gitlab, oci, podman, typescript

A friendly recently showed me a Galton Board. Think binomial distributions and maybe even Plinko. (Wiki, and a simulation by Wolfgang Christian.)

I ended up implementing one of my own using TypeScript and as an excuse to play more with podman (an OCI container manager like Docker).

You can find the source code on gitlab and run it there, too: https://aquarichy.gitlab.io/kosmo-galton

Here are some random notes from the effort:

Gitlab pipelines for to deploy pages only work on the default branch. I tried to work on my .gitlab-ci.yml file in a side branch and that didn't work. Oops.
I had a path hard-coded in my Makefile that I didn't really want to expose, so I had some in git fun branching to its commit, editing, rebasing, and then clearing my reflog and doing some gc. :D
Since typescript is available for npm, using the node:lts image that gitlab likes to suggest is great. Just need to do "npm install -g typescript" first.
Podman images: I played with a variety of Containerfile approaches to look at resulting image size. See below.

Containers

I played around with single and two-stage Containerfiles. The point of two stage was to have typescript's tsc available to compile the code within a container, without needing to carry nodejs in the final image. In general I used Apache's httpd.

I appreciate that images are stored as layers and common layers are reused, so making use of, say, a 1GB container image (node) in 3 images of my own doesn't result in 3GB of additional disk space used. Still, I find it valuable to be conscious of space and memory usage, and minimizing unnecessary files when preparing and deploying software.

Also, while I may "save space" by using smaller images, in this case, that results in extra network I/O and CPU time as I now end up needing to use dnf/microdnf/apk/npm to install nodejs/npm/typescript (depending on combination).

Single-stage:

httpd: Pre-transcompile .ts into .js outside of the image; then use docker's httpd image and just add my HTML/JS/CSS files to it.

Two stage (first compile ts; second deploy in Apache)

fedora-minimal>httpd: First stage, start with the fedora-minimal image from registry.fedoraproject.org, install its typescript using microdnf (pulls in node), transcompile .ts; second stage, start with httpd again and copy over transcompiled .js, and add HTML/CSS from outside of the image.
node>httpd: First stage, start with node image from docker, install typescript using npm, add and transcompile .ts; second stage like #2.
alpine>httpd: First stage, start with tiny alpine image from docker, add npm using apk, then install typescript with npm. The rest like #2 and #3.
alpine>alpine: First stage, same as #4 using alpine; stage two, start with alpine again, add apache2 with apk, copy in HTML/CSS/JS as before, and add CMD to store httpd in the foreground (warning: dumb, simple configuration; not appropriate for real usage as is)

For base image sizes:

docker.io/library/alpine: 7.63MB
registry.fedoraproject.org/library/fedora-minimal: 97.8MB
registry.fedoraproject.org/library/fedora: 196MB
docker.io/library/httpd: 173MB
docker.io/library/node: 1.12GB

For my intermediary and final images, I end up with:

Approach	Stage	Image description	Modified image size	Base image size	Note
1	(n/a)	httpd	173 MB	173 MB	No notable change!
2	1	fedora-minimal + nodejs, typescript	332 MB	97 MB	nodejs + typescript add a lot
3	1	node + typescript	1.18 GB	1.12 GB	node image starts off big
4	1	alpine + npm, typescript	133 MB	7.63 MB	nodejs, npm and typescript adding a lot again
5	2	alpine + httpd	13.5 MB	7.63 MB	so tiny vs. the 173 MB httpd image!

This makes it worthwhile to tag and have a container with TypeScript set-up already. :)

2023-09-05

[Technology] systemd timers and service files in place of cron

Posted by Richard um 20:17
Labels: #Technology, cron, emacs, linux, org-mode, systemd

I keep a daily journal in an Emacs org-mode file. Sometimes due to clumsy keypresses in Emacs, I have temporarily deleted (!) lines or sections (!). I think I have always caught these mistakes, but to be safe, I decided to create a git repository for it and do a daily commit.

In the past, I'd have used a cron job, but in recent years I've tried to make more and more use of systemd and timers. Is one better than the other? I'm going to ignore that question and proceed. :D

For this task, which I'm calling "log_git_committer" (very uninspired), I have four files:

log_git_committer.sh - a Bash script that handles actually committing changes
log_git_committer.service - a systemd service file that references the .sh file
log_git_committer.timer - a systemd timer file that implicitly references .service unit and describes the interval
Makefile - a simple make file to install and uninstall my files as I work on it

Below are some of the more interesting notes from working on it, and the systemd files.

Notes

Some notable pieces that came up:

.service:

Type=oneshot: since I'm running this from a timer, I just want this service unit to be a "oneshot", as it's not actually a service that would keep running in the background.
ExecStart= and %h: I'm installing the shell script locall in my ~/.local/bin directory. ExecStart in a .service won't accept ~/ or $HOME/ in a path, preferring absolute paths. However, systemd has some specifiers like %h that will substitute in a user's path when a service unit is run with --user.

.timer:

OnCalendar=: it has a lot of options that are similar to cron. Neat.
retroactively running missed timers? Yes, if I set a timer to run at 3AM but I put my computer to sleep at 1AM and turn it back on at 8AM, the timer will then execute. If I end up in Narnia and my computer is asleep for 3 days while I'm away (a lifetime there), it will then run the service just once to catch up. Nice.

Makefile:

unit file syntax checking: systemd provides a command that lets you analyze your unit files to check for correctness. E.g.
systemd-analyze verify log_git_committer.timer
systemd-analyze verify log_git_committer.service
so I added that to a target.
systemd install directory: for non-root user services and timers, the install directory is ~/.config/systemd/user/
Some other locations on my Fedora 38 system for systemd unit files include:

/usr/lib/systemd/user/
/usr/lib/systemd/service/
/etc/systemd/user/
/etc/systemd/system/

systemd and reloading updated .service files: if I make a change to a .service file, systemd would like me to reload the new one from disk. So, in my Makefile, after I copy .service and .timer files into their install directory, I call:
systemctl --user daemon-reload

log_git_committer.sh

If there's interest, I can share this, but it's pretty simple and straight forward.

log_git_committer.service

[Unit]
Description=Commit changes to a log file

[Service]
Type=oneshot
ExecStart=%h/.local/bin/log_git_committer.sh

[Install]
WantedBy=multi-user.target

log_git_committer.timer

[Unit]
Description=Do a daily commit of the log file.

[Timer]
OnCalendar=3:00:00
Persistent=true

[Install]
WantedBy=timers.target

Resources

[Technology] web development: TypeScript!

Posted by Richard um 19:00
Labels: #Technology, javascript, typescript, typing, web development

I used to be a snob about various technologies. Two decades ago, I hated doing web development with JavaScript, dealing with browser quirks and what felt like deficiencies in the base standard and browser client libraries. It felt like you couldn't write simple, safe and predictable code, especially without relying on a third party framework or library.

However, much has changed and I genuinely enjoy it now. I still often prefer to write code without third-party libraries or frameworks if I can afford to, but I have increasingly come to rely on TypeScript. I value type checking a lot, especially as projects grow and APIs evolve. Managing types explicitly entails more work earlier on, but it also helps spare me from silly bugs and saves me time during refactoring or when revisiting code I wrote a while ago.

TypeScript works by transcompiling TypeScript (in .ts files) into pure JavaScript. Some of my favourite features include:

getting to set language standard targets in configuration. E.g. setting your target to "es2016" will mean the generated JavaScript will down-compile newer syntax into syntax that exists in ECMAScript 2016. (Note: this doesn't polyfill API functionality!)
easily defining interface types for simple objects with fixed property names

There's some effort to promote a native static type system in future ECMAScript, like this proposal: https://github.com/tc39/proposal-type-annotations. I hope for this friendlier future.

2023-09-03

[Technology] firejail: sandboxing with Linux namespaces

Posted by Richard um 16:04
Labels: #Technology, firejail, linux, sandboxes, security

I am generally fairly wary of the amount of access software has on my computer. Consequently, I like to use firejail on Linux to sandbox a lot of applications. E.g. am I playing a single-player game from itch.io? It doesn't need access to my mount points, my home directory (beyond the game's own directory for the program, game files and save data) or to the network. Sometimes I am stunned by how much trust I put into random software back in 1998

It uses Linux namespace and seccomp-bfp.

An example command of what I might use would be:

$ firejail --net=none --disable-mnt --whitelist=/home/myuser/files/games/game_title/ ./game.sh

Some common programs have pre-existing profiles defined by firejail, like firefox, and those can be found in /etc/firejail. In the case of firefox, one notable change is access to your home directory: it gets restricted to just your downloads folder!

.plan