2015-08-19

[Technology] Automating my life away




A decade ago, part of a job was to help automate creation
of a report that normally involved a human scanning through dozens of
spreadsheets each year, copying and pasting numbers and generating
graphs and tables, manually.

The place where I worked
predominantly used Perl.  (It was a science laboratory.)   My solution
was to use OpenOffice's GUI to bulk-convert the Microsoft Excel
spreadsheets (binary XLS files) to whichever file format OpenOffice used
at the time that was actually a collection of XML in a zipped
directory. (Was that .sdc?)

I then used Perl's libxml
wrapper (probably wrapping libxml2) to parse the XML into an SQLite
database.  (Or was it PostgeSQL like everything else there?)  I had
tried to use some other Perl XML library, but it was Super Slow compared
to the snappy libxml2 (written in C).  Like, a difference in many
minutes to run versus one second.


From there, I wrote
some Perl CGI scripts to automatically generate most of the tables and
graphics into HTML form, in some fashion that looked nice when imported
into OpenOffice. (Really?!)  The person who was responsible for the
report, who normally took a few weeks to do, said that it would save him
so much of the time, that it would take him now a couple days to look
at the data and write conclusions and maybe do some cleaning.  He
marveled at automation and how quickly he could be replaced.  The
automation revolution in action.


Today I used LibreOffice to convert about a hundred spreadsheets into CSV files with this one command:

find . -name "*.xls" | while read line; do
  (cd "$(dirname "$line")" &&
   libreoffice --headless --convert-to csv "$(basename "$line")");
done





I
am not sure that OpenOffice could do headless conversion back then.  I
was probably overwhelmed by all the potential everywhere to discover the
functionality if it exists.  This would have helped free a few hours of
my life.

Automation, a revolution.



UPDATE:

Converting to CSV only does the 1st sheet tab.  You apparently can't specify a different sheet to be the active sheet.  I also had to convert some other old spreadsheets for easy parsing (mostly just grep'ing through them to find data).   In that case, I used --convert-to html which was beautiful, even if one of the resultant HTML files was 138MB large.

Keine Kommentare:

Kommentar veröffentlichen

Blog Archive

Dieses Blog durchsuchen

Labels

#Technology #GNOME gnome gxml fedora bugs linux vala google #General firefox security gsoc GUADEC android bug xml fedora 18 javascript libxml2 programming web blogger encryption fedora 17 gdom git emacs libgdata memory mozilla open source serialisation upgrade web development API Spain containers design evolution fedora 16 fedora 20 fedora 22 fedup file systems friends future glib gnome shell internet luks music performance phone photos php podman preupgrade tablet testing typescript yum #Microblog Network Manager adb apache art automation bash brno catastrophe css data loss debian debugging deja-dup disaster docker emusic errors ext4 facebook fedora 19 gee gir gitlab gitorious gmail gobject google talk google+ gtk html libxml mail microsoft mtp mysql namespaces nautilus nextcloud owncloud picasaweb pitivi ptp python raspberry pi resizing rpm school selinux signal sms speech dispatcher systemd technology texting time management uoguelph usability video web design youtube #Tech Air Canada C Electron Element Empathy Europe GError GNOME 3 GNOME Files Go Google Play Music Grimes IRC Mac OS X Mario Kart Memento Nintendo Nintendo Switch PEAP Selenium Splatoon UI VPN Xiki accessibility advertising ai albums anaconda anonymity apple ask asus eee top automake autonomous automobiles b43 backup battery berlin bit rot broadcom browsers browsing canada canadian english cars chrome clarity comments communication compiler complaints computer computers configuration console constructive criticism cron cropping customisation dataloss dconf debug symbols design patterns desktop summit development discoverability distribution diy dnf documentation drm duplicity e-mail efficiency email english environment estate experimenting ext3 fedora 11 festival file formats firejail flac flatpak forgottotagit freedom friendship fuse galaxy nexus galton gay rights gdb german germany gimp gio gjs gnome software gnome-control-center google assistant google calendar google chrome google hangouts google reader gqe graphviz growth gtest gtg gvfs gvfs metadata hard drive hard drives hardware help hp humour ide identity instagram installation instant messaging integration intel interactivity introspection jabber java java 13 jobs kernel keyboard language language servers languages law learning lenovo letsencrypt libreoffice librpm life livecd liveusb login lsp macbook maintainership mariadb mario matrix memory leaks messaging mounting mouse netflix new zealand node nodelist numix obama oci ogg oggenc oh the humanity open open standards openoffice optimisation org-mode organisation package management packagekit paint shedding parallelism pdo perl pipelight privacy productivity progress progressive web apps pumpkin pwa pyright quality recursion redhat refactoring repairs report rhythmbox rust sandboxes scheduling screenshots self-navigating car shell sleep smartphones software software engineering speed sql ssd synergy tabs test tests themes thesis tracker travel triumf turtles tv tweak twist typing university update usb user experience valadoc video editing volunteering vpnc waf warm wayland weather web apps website wifi wiki wireless wishes work xinput xmpp xorg xpath
Powered by Blogger.