Here are a few nifty vala features I did not already know.
I can run "vala foo.vala" and it apparently does "valac foo.vala; ./foo".
Also, I can have preprocessing directorives like this
"#ifdef DUCK
stdout.printf ("QUACK");
#else
stdout.printf ("Woof");
#endif"
I can't seem to use #define, but I can do
"valac -D DUCK foo.vala"
which will define the symbol.
Hooray/boo for conditional compilation!
2013-09-30
2013-09-25
[GNOME] Final Report for GXml in the 2013 Google Summer of Code
The Google Summer of Code has ended, and GXml is spoiled with the fruits of labour:
I've talked about those before (near the start and while at GUADEC) so for my report I'm going to focus on the outcome in terms of performance.
Look forward to 0.4.0 imminently, and happy hacking.
One question people have had is the difference in performance between libxml2 and GXml, since GXml currently wraps it. Things should be worse, as there's typically more code for each operation, but how large will the penalty be and will it matter for you?
I created a simple test suite with the four following tasks:
The test suite is highly modular, and it's easy to add new tests. For
each test, you define a setup function, a test function (the measured
test), and a cleanup function. So if you'd like to see anything else in particular tested, let me know.
I've run it on a Lenovo ThinkPad Twist S230u with the following configuration
The test data was based on my updateinfo.xml files from yum, in particular the one found at: /var/cache/yum/x86_64/19/updates/gen/updateinfo.xml. It contained 98743 different nodes over 11,136kB. I created smaller and larger versions of it, resulting in
This testing could be improved by using diffferent types of files with different types of data. Flatter ones versus deeper ones, for instance. The different sizes were done by either duplicating the content within the root element or by deleting the second half of nodes inside the root element. test5.xml represents the original updateinfo.xml
Three values were measured. One was time taken to complete a task (like load a file), using g_get_monotonic_time, which reports in microseconds. One was memory used by the task after it completed, using mallinfo, in particular the uordblks field (total allocated space), and one was memory leaks (also using mallinfo, after we freed memory).
I ran the tests once averaged over 10 trials for each combination of test and file, and then again over 25 trials. Ways the procedure could be improved includes better isolation on the system from other processes, or providing more detail than the averaged scores, so we can detect any exceptional anomalies (e.g. some other process causes a file load to be delayed by hogging I/O).
Keep in mind that GXml wraps libxml2 for most functionality, so we
don't expect it to be faster than libxml2, rather we want to see what
penalty a GObject wrapper (written in Vala) causes.
GXml was leaking memory like a sieve before the summer. (0.3.2 includes memory leak fixes without the API breaks!), so I wanted to know what memory was left after these tasks from both libxml2 and GXml. Luckily, neither had any in the cases tested. (That does not mean there aren't any! Kudos to those who find them (and more to do who patch them)).
loading documents from disk
When it comes to loading a file from the disk, we compared xmlReadFile versus gxml_document_new_from_path (which uses xmlParseFile).
Memory usage differences are consistently ~14% higher.
Time-wise, on smaller files, GXml tasks up to 50% longer than using libxml2. I'm not sure why test4.xml is miraculously lower from this run. You can see that the larger the file, smaller the difference, which makes sense, since most of the hardwork is done by libxml2 anyway.
loading documents from memory
With memory, again, we see a consistent increase between ~14-16%.
Time-wise, again GXml oddly performs better on test4.xml. Elsewise, we see the same trend: there is little difference with larger files.
saving to disk
We don't report memory differences because GXml's save functionality cleans up its use of xmlSaveCtxt before it exits, so we can't (easily) see how much we used. Neither leak, so there is nothing to see there.
Time-wise, it seems to take about the same length of time, but GXml may be trending to more. This could be due to tasks like synchronising data that is initially stored just in GXmlNodes and needs to be copied into the xmlDoc of libxml2 to make it to disk.
stringification
Memory-wise, we typically see an increase of ~10-15%. Note that they failed to handle the stringification of the largest file, test7.xml, which requires further investigation. Stringification was done with xmlDocDumpFormatMemory.
Time-wise, the increase was ~16-20%.
Regarding memory usage, if you use GXml for cases such as these, you can expect around a 15% increase in memory usage. That makes sense, as GObjects are used instead of the light C structures libxml2 typically does. One benefit in hwrapping libxml2 is that we don't actually create a GXmlNode for every xmlNode in a document, only the ones we use, so a pure GObject implementation might use more memory.
Regarding time usage, the difference for some operations is small, a couple percent, and for others, the difference is larger with smaller files, as big as 50% when loading a smaller file. Larger files in those cases (such as loading documents) see less and less of a penalty.
I feel as though for many common applications, these don't represent a significant penalty (time taken in loading large documents is still a few dozen milliseconds), and can be worth the benefits in using a GObject API.
If you're interested in more about GXml's performance, the test suite will be in gxml/tests/performance/. Feel free to submit new tests and test files.
Regarding GXml, HEAD will be pushed out in a new feature release including the API changes, fancy new features, and contributions from others, including Daniel Espinosa, Adam Ples, Simon Reimer, and others.
Cheerio!
- the autotools build system has improved
- documentation is more complete and more accurate
- many new examples across most classes, especially for C and JavaScript
- many bugs were flushed out and fixed (e.g. attribute syncing between underlying libxml2 xmlNodes and GXmlElements)
- it has a mailing list (gxml-list@gnome.org)
- new stuff
- document child management, node cloning
- new memory tests
- new error handling model
- new memory handling model (fixing leaks and improving performance!)
- improved API compliance
- bug-fix release (0.3.2) without API breaks
- imminent 0.4.0 with API breaks (pending some updated patches for XPath, Serialization, etc)
I've talked about those before (near the start and while at GUADEC) so for my report I'm going to focus on the outcome in terms of performance.
Look forward to 0.4.0 imminently, and happy hacking.
GXml's performance versus pure libxml2
One question people have had is the difference in performance between libxml2 and GXml, since GXml currently wraps it. Things should be worse, as there's typically more code for each operation, but how large will the penalty be and will it matter for you?
Tests
I created a simple test suite with the four following tasks:
- loading a file from disk
- loading a file from memory
- stringifying a document
- saving a document to disk
The test suite is highly modular, and it's easy to add new tests. For
each test, you define a setup function, a test function (the measured
test), and a cleanup function. So if you'd like to see anything else in particular tested, let me know.
Environment
I've run it on a Lenovo ThinkPad Twist S230u with the following configuration
- Intel® Core™ i5-3317U CPU @ 1.70GHz × 4
- 4GB RAM, SODIMM DDR3 Synchronous 1333 MHz (0,8 ns)
- 500GB HD @ 5400 RPM (HGST HTS725050A7)
- /home, including test files
- 24GB SSD (Samsung MZMPA024)
- everything outside of /home, including libraries
- Fedora 19, x86_64
- libxml2-2.9.1-1.fc19
- GXml from git HEAD
Test Data
The test data was based on my updateinfo.xml files from yum, in particular the one found at: /var/cache/yum/x86_64/19/updates/gen/updateinfo.xml. It contained 98743 different nodes over 11,136kB. I created smaller and larger versions of it, resulting in
name | nodes | size (kB) |
---|---|---|
test3.xml | 22 276 | 2 784 |
test4.xml | 47 707 | 5 568 |
test5.xml | 98 743 | 11 136 |
test6.xml | 197 484 | 22 268 |
test7.xml | 394 966 | 44 536 |
This testing could be improved by using diffferent types of files with different types of data. Flatter ones versus deeper ones, for instance. The different sizes were done by either duplicating the content within the root element or by deleting the second half of nodes inside the root element. test5.xml represents the original updateinfo.xml
Measurements
Three values were measured. One was time taken to complete a task (like load a file), using g_get_monotonic_time, which reports in microseconds. One was memory used by the task after it completed, using mallinfo, in particular the uordblks field (total allocated space), and one was memory leaks (also using mallinfo, after we freed memory).
Procedure
I ran the tests once averaged over 10 trials for each combination of test and file, and then again over 25 trials. Ways the procedure could be improved includes better isolation on the system from other processes, or providing more detail than the averaged scores, so we can detect any exceptional anomalies (e.g. some other process causes a file load to be delayed by hogging I/O).
Results
Keep in mind that GXml wraps libxml2 for most functionality, so we
don't expect it to be faster than libxml2, rather we want to see what
penalty a GObject wrapper (written in Vala) causes.
Memory Leaks
GXml was leaking memory like a sieve before the summer. (0.3.2 includes memory leak fixes without the API breaks!), so I wanted to know what memory was left after these tasks from both libxml2 and GXml. Luckily, neither had any in the cases tested. (That does not mean there aren't any! Kudos to those who find them (and more to do who patch them)).
Results
data | libxml2 | gxml | diff | ||
---|---|---|---|---|---|
load disk | |||||
memory | |||||
test3.xml | 20814019 | 23667584 | 1,1371 | ||
test4.xml | 42604277 | 48477152 | 1,1378 | ||
test5.xml | 86151738 | 98065217 | 1,1383 | ||
test6.xml | 172261657 | 196126066 | 1,1385 | ||
test7.xml | 344483559 | 392241280 | 1,1386 | ||
time | |||||
test3.xml | 37547 | 56513 | 1,5051 | ||
test4.xml | 66747 | 63797 | 0,9558 | ||
test5.xml | 144234 | 161024 | 1,1164 | ||
test6.xml | 284488 | 287911 | 1,0120 | ||
test7.xml | 561406 | 564904 | 1,0062 | ||
load mem | |||||
memory | |||||
test3.xml | 24988568 | 28866015 | 1,1552 | ||
test4.xml | 51434229 | 59523841 | 1,1573 | ||
test5.xml | 104192043 | 120665588 | 1,1581 | ||
test6.xml | 208356730 | 241330737 | 1,1583 | ||
test7.xml | 343791009 | 391564027 | 1,1390 | ||
time | |||||
test3.xml | 44199 | 53860 | 1,2186 | ||
test4.xml | 84215 | 71695 | 0,8513 | ||
test5.xml | 172920 | 184735 | 1,0683 | ||
test6.xml | 347157 | 359909 | 1,0367 | ||
test7.xml | 572627 | 555519 | 0,9701 | ||
save | |||||
time | |||||
test3.xml | 25610 | 24513 | 0,9572 | ||
test4.xml | 52908 | 49175 | 0,9294 | ||
test5.xml | 96449 | 98308 | 1,0193 | ||
test6.xml | 192197 | 196295 | 1,0213 | ||
test7.xml | 384343 | 395194 | 1,0282 | ||
stringify | |||||
memory | |||||
test3.xml | 2735339 | 3136192 | 1,1465 | ||
test4.xml | 5696496 | 6287776 | 1,1038 | ||
test5.xml | 11394656 | 12592800 | 1,1051 | ||
test6.xml | 22789264 | 25185552 | 1,1051 | ||
time | |||||
test3.xml | 22873 | 26749 | 1,1695 | ||
test4.xml | 46166 | 54537 | 1,1813 | ||
test5.xml | 93205 | 111312 | 1,1943 | ||
test6.xml | 198988 | 235645 | 1,1842 | ||
Discussion
loading documents from disk
When it comes to loading a file from the disk, we compared xmlReadFile versus gxml_document_new_from_path (which uses xmlParseFile).
Memory usage differences are consistently ~14% higher.
Time-wise, on smaller files, GXml tasks up to 50% longer than using libxml2. I'm not sure why test4.xml is miraculously lower from this run. You can see that the larger the file, smaller the difference, which makes sense, since most of the hardwork is done by libxml2 anyway.
loading documents from memory
With memory, again, we see a consistent increase between ~14-16%.
Time-wise, again GXml oddly performs better on test4.xml. Elsewise, we see the same trend: there is little difference with larger files.
saving to disk
We don't report memory differences because GXml's save functionality cleans up its use of xmlSaveCtxt before it exits, so we can't (easily) see how much we used. Neither leak, so there is nothing to see there.
Time-wise, it seems to take about the same length of time, but GXml may be trending to more. This could be due to tasks like synchronising data that is initially stored just in GXmlNodes and needs to be copied into the xmlDoc of libxml2 to make it to disk.
stringification
Memory-wise, we typically see an increase of ~10-15%. Note that they failed to handle the stringification of the largest file, test7.xml, which requires further investigation. Stringification was done with xmlDocDumpFormatMemory.
Time-wise, the increase was ~16-20%.
Conclusion
Regarding memory usage, if you use GXml for cases such as these, you can expect around a 15% increase in memory usage. That makes sense, as GObjects are used instead of the light C structures libxml2 typically does. One benefit in hwrapping libxml2 is that we don't actually create a GXmlNode for every xmlNode in a document, only the ones we use, so a pure GObject implementation might use more memory.
Regarding time usage, the difference for some operations is small, a couple percent, and for others, the difference is larger with smaller files, as big as 50% when loading a smaller file. Larger files in those cases (such as loading documents) see less and less of a penalty.
I feel as though for many common applications, these don't represent a significant penalty (time taken in loading large documents is still a few dozen milliseconds), and can be worth the benefits in using a GObject API.
Going forward
If you're interested in more about GXml's performance, the test suite will be in gxml/tests/performance/. Feel free to submit new tests and test files.
Regarding GXml, HEAD will be pushed out in a new feature release including the API changes, fancy new features, and contributions from others, including Daniel Espinosa, Adam Ples, Simon Reimer, and others.
Cheerio!
2013-09-17
[Technology] Mozilla removing certificates
Mozilla removes certificate revocation lists from Firefox. I wonder if this is a good idea.
Abonnieren
Posts (Atom)
Dieses Blog durchsuchen
Labels
#Technology
#GNOME
gnome
gxml
fedora
bugs
linux
vala
google
#General
firefox
security
gsoc
GUADEC
android
bug
xml
fedora 18
javascript
libxml2
programming
web
blogger
encryption
fedora 17
gdom
git
emacs
libgdata
memory
mozilla
open source
serialisation
upgrade
web development
API
Spain
containers
design
evolution
fedora 16
fedora 20
fedora 22
fedup
file systems
friends
future
glib
gnome shell
internet
luks
music
performance
phone
photos
php
podman
preupgrade
tablet
testing
typescript
yum
#Microblog
Network Manager
adb
apache
art
automation
bash
brno
catastrophe
css
data loss
debian
debugging
deja-dup
disaster
docker
emusic
errors
ext4
facebook
fedora 19
gee
gir
gitlab
gitorious
gmail
gobject
google talk
google+
gtk
html
libxml
mail
microsoft
mtp
mysql
namespaces
nautilus
nextcloud
owncloud
picasaweb
pitivi
ptp
python
raspberry pi
resizing
rpm
school
selinux
signal
sms
speech dispatcher
systemd
technology
texting
time management
uoguelph
usability
video
web design
youtube
#Tech
Air Canada
C
Electron
Element
Empathy
Europe
GError
GNOME 3
GNOME Files
Go
Google Play Music
Grimes
IRC
Mac OS X
Mario Kart
Memento
Nintendo
Nintendo Switch
PEAP
Selenium
Splatoon
UI
VPN
Xiki
accessibility
advertising
ai
albums
anaconda
anonymity
apple
ask
asus eee top
automake
autonomous automobiles
b43
backup
battery
berlin
bit rot
broadcom
browsers
browsing
canada
canadian english
cars
chrome
clarity
comments
communication
compiler
complaints
computer
computers
configuration
console
constructive criticism
cron
cropping
customisation
dataloss
dconf
debug symbols
design patterns
desktop summit
development
discoverability
distribution
diy
dnf
documentation
drm
duplicity
e-mail
efficiency
email
english
environment
estate
experimenting
ext3
fedora 11
festival
file formats
firejail
flac
flatpak
forgottotagit
freedom
friendship
fuse
galaxy nexus
galton
gay rights
gdb
german
germany
gimp
gio
gjs
gnome software
gnome-control-center
google assistant
google calendar
google chrome
google hangouts
google reader
gqe
graphviz
growth
gtest
gtg
gvfs
gvfs metadata
hard drive
hard drives
hardware
help
hp
humour
ide
identity
instagram
installation
instant messaging
integration
intel
interactivity
introspection
jabber
java
java 13
jobs
kernel
keyboard
language
language servers
languages
law
learning
lenovo
letsencrypt
libreoffice
librpm
life
livecd
liveusb
login
lsp
macbook
maintainership
mariadb
mario
matrix
memory leaks
messaging
mounting
mouse
netflix
new zealand
node
nodelist
numix
obama
oci
ogg
oggenc
oh the humanity
open
open standards
openoffice
optimisation
org-mode
organisation
package management
packagekit
paint shedding
parallelism
pdo
perl
pipelight
privacy
productivity
progress
progressive web apps
pumpkin
pwa
pyright
quality
recursion
redhat
refactoring
repairs
report
rhythmbox
rust
sandboxes
scheduling
screenshots
self-navigating car
shell
sleep
smartphones
software
software engineering
speed
sql
ssd
synergy
tabs
test
tests
themes
thesis
tracker
travel
triumf
turtles
tv
tweak
twist
typing
university
update
usb
user experience
valadoc
video editing
volunteering
vpnc
waf
warm
wayland
weather
web apps
website
wifi
wiki
wireless
wishes
work
xinput
xmpp
xorg
xpath
Powered by Blogger.