[Technology] Efficiency!

I was using a script for a project that applied a bunch of transformation and filtering rules to some data.  There are only about 60 rules right now, and there is an average of 272 data records each day.   I originally wrote this quickly in a bash script using the terminal commands sed (for replacing a matched pattern with some string) and grep (for filtering based on a matched pattern).

Running it in a given month gets progressively longer as the month goes on, and it was taking on average over 5 minutes (!) to run those 62 rules.  Applying all the rules to each data record was taking almost a full second each.  This wasn't really scalable.

So, today I ported the script to vala, which is a C#-like language that compiles down into C (which is then compiled into native code).  This promises to be much quicker than interpreted code to begin with.  The run time once converted was less than 1 second.  That was a pleasant surprise.

Some of the fun was moving the transformation rules into their own file (XML), separate from the code, and making sure all the Regex objects were created once at the start of the program and then applied to all later data.

To make it still better would probably involve parsing the data records into a structured form from the start, rather than just using regular expressions to transform data strings.  Then creating a couple different type of rules for specific fields.  But this is good enough for now. :)

Keine Kommentare:

Kommentar veröffentlichen

Dieses Blog durchsuchen


#Technology #GNOME gnome gxml fedora bugs linux vala google #General firefox security gsoc GUADEC android bug xml fedora 18 javascript libxml2 programming web blogger encryption fedora 17 gdom git libgdata memory mozilla open source serialisation upgrade web development API Spain design emacs evolution fedora 16 fedora 20 fedora 22 fedup file systems friends future glib gnome shell internet luks music performance phone photos preupgrade tablet testing yum #Microblog Network Manager adb art automation bash brno catastrophe containers css data loss deja-dup disaster emusic errors ext4 facebook fedora 19 gee gir gitlab gitorious gmail gobject google talk google+ html libxml mail microsoft mtp namespaces nautilus php picasaweb podman ptp resizing rpm school selinux sms speech dispatcher systemd technology texting time management typescript uoguelph usability video web design youtube #Tech Air Canada C Empathy Europe GError GNOME 3 GNOME Files Go Google Play Music Grimes IRC Mac OS X Mario Kart Memento Nintendo Nintendo Switch PEAP Selenium Splatoon UI VPN Xiki accessibility advertising ai albums anaconda anonymity apache apple ask asus eee top automake autonomous automobiles b43 backup battery berlin bit rot broadcom browsers browsing canada canadian english cars chrome clarity comments communication compiler complaints computer computers configuration console constructive criticism cron customisation dataloss dconf debian debug symbols debugging design patterns desktop summit development discoverability distribution diy dnf docker documentation drm duplicity e-mail efficiency email english environment estate experimenting ext3 fedora 11 festival file formats firejail flac forgottotagit freedom friendship fuse galaxy nexus galton gay rights gdb german germany gimp gio gjs gnome software gnome-control-center google assistant google calendar google chrome google hangouts google reader gqe graphviz growth gtest gtg gtk gvfs gvfs metadata hard drive hard drives hardware help hp humour identity instagram installation instant messaging integration intel interactivity introspection jabber java java 13 jobs kernel keyboard language languages law learning lenovo letsencrypt libreoffice librpm life livecd liveusb login macbook maintainership mario memory leaks messaging mounting mouse mysql netflix new zealand node nodelist numix obama oci ogg oggenc oh the humanity open open standards openoffice optimisation org-mode organisation package management packagekit paint shedding parallelism pdo perl pipelight pitivi privacy productivity progress progressive web apps pumpkin pwa python quality recursion redhat refactoring repairs report rhythmbox sandboxes scheduling screenshots self-navigating car shell signal sleep smartphones software software engineering speed sql ssd synergy tabs test tests themes thesis tracker travel triumf turtles tv tweak twist typing university update usb user experience valadoc volunteering vpnc waf warm wayland weather web apps website wifi wiki wireless wishes work xinput xmpp xorg xpath
Powered by Blogger.