- the autotools build system has improved
- documentation is more complete and more accurate
- many new examples across most classes, especially for C and JavaScript
- many bugs were flushed out and fixed (e.g. attribute syncing between underlying libxml2 xmlNodes and GXmlElements)
- it has a mailing list (gxml-list@gnome.org)
- new stuff
- document child management, node cloning
- new memory tests
- new error handling model
- new memory handling model (fixing leaks and improving performance!)
- improved API compliance
- bug-fix release (0.3.2) without API breaks
- imminent 0.4.0 with API breaks (pending some updated patches for XPath, Serialization, etc)
I've talked about those before (near the start and while at GUADEC) so for my report I'm going to focus on the outcome in terms of performance.
Look forward to 0.4.0 imminently, and happy hacking.
GXml's performance versus pure libxml2
One question people have had is the difference in performance between libxml2 and GXml, since GXml currently wraps it. Things should be worse, as there's typically more code for each operation, but how large will the penalty be and will it matter for you?
Tests
I created a simple test suite with the four following tasks:
- loading a file from disk
- loading a file from memory
- stringifying a document
- saving a document to disk
The test suite is highly modular, and it's easy to add new tests. For
each test, you define a setup function, a test function (the measured
test), and a cleanup function. So if you'd like to see anything else in particular tested, let me know.
Environment
I've run it on a Lenovo ThinkPad Twist S230u with the following configuration
- Intel® Core™ i5-3317U CPU @ 1.70GHz × 4
- 4GB RAM, SODIMM DDR3 Synchronous 1333 MHz (0,8 ns)
- 500GB HD @ 5400 RPM (HGST HTS725050A7)
- /home, including test files
- 24GB SSD (Samsung MZMPA024)
- everything outside of /home, including libraries
- Fedora 19, x86_64
- libxml2-2.9.1-1.fc19
- GXml from git HEAD
Test Data
The test data was based on my updateinfo.xml files from yum, in particular the one found at: /var/cache/yum/x86_64/19/updates/gen/updateinfo.xml. It contained 98743 different nodes over 11,136kB. I created smaller and larger versions of it, resulting in
name | nodes | size (kB) |
---|---|---|
test3.xml | 22 276 | 2 784 |
test4.xml | 47 707 | 5 568 |
test5.xml | 98 743 | 11 136 |
test6.xml | 197 484 | 22 268 |
test7.xml | 394 966 | 44 536 |
This testing could be improved by using diffferent types of files with different types of data. Flatter ones versus deeper ones, for instance. The different sizes were done by either duplicating the content within the root element or by deleting the second half of nodes inside the root element. test5.xml represents the original updateinfo.xml
Measurements
Three values were measured. One was time taken to complete a task (like load a file), using g_get_monotonic_time, which reports in microseconds. One was memory used by the task after it completed, using mallinfo, in particular the uordblks field (total allocated space), and one was memory leaks (also using mallinfo, after we freed memory).
Procedure
I ran the tests once averaged over 10 trials for each combination of test and file, and then again over 25 trials. Ways the procedure could be improved includes better isolation on the system from other processes, or providing more detail than the averaged scores, so we can detect any exceptional anomalies (e.g. some other process causes a file load to be delayed by hogging I/O).
Results
Keep in mind that GXml wraps libxml2 for most functionality, so we
don't expect it to be faster than libxml2, rather we want to see what
penalty a GObject wrapper (written in Vala) causes.
Memory Leaks
GXml was leaking memory like a sieve before the summer. (0.3.2 includes memory leak fixes without the API breaks!), so I wanted to know what memory was left after these tasks from both libxml2 and GXml. Luckily, neither had any in the cases tested. (That does not mean there aren't any! Kudos to those who find them (and more to do who patch them)).
Results
data | libxml2 | gxml | diff | ||
---|---|---|---|---|---|
load disk | |||||
memory | |||||
test3.xml | 20814019 | 23667584 | 1,1371 | ||
test4.xml | 42604277 | 48477152 | 1,1378 | ||
test5.xml | 86151738 | 98065217 | 1,1383 | ||
test6.xml | 172261657 | 196126066 | 1,1385 | ||
test7.xml | 344483559 | 392241280 | 1,1386 | ||
time | |||||
test3.xml | 37547 | 56513 | 1,5051 | ||
test4.xml | 66747 | 63797 | 0,9558 | ||
test5.xml | 144234 | 161024 | 1,1164 | ||
test6.xml | 284488 | 287911 | 1,0120 | ||
test7.xml | 561406 | 564904 | 1,0062 | ||
load mem | |||||
memory | |||||
test3.xml | 24988568 | 28866015 | 1,1552 | ||
test4.xml | 51434229 | 59523841 | 1,1573 | ||
test5.xml | 104192043 | 120665588 | 1,1581 | ||
test6.xml | 208356730 | 241330737 | 1,1583 | ||
test7.xml | 343791009 | 391564027 | 1,1390 | ||
time | |||||
test3.xml | 44199 | 53860 | 1,2186 | ||
test4.xml | 84215 | 71695 | 0,8513 | ||
test5.xml | 172920 | 184735 | 1,0683 | ||
test6.xml | 347157 | 359909 | 1,0367 | ||
test7.xml | 572627 | 555519 | 0,9701 | ||
save | |||||
time | |||||
test3.xml | 25610 | 24513 | 0,9572 | ||
test4.xml | 52908 | 49175 | 0,9294 | ||
test5.xml | 96449 | 98308 | 1,0193 | ||
test6.xml | 192197 | 196295 | 1,0213 | ||
test7.xml | 384343 | 395194 | 1,0282 | ||
stringify | |||||
memory | |||||
test3.xml | 2735339 | 3136192 | 1,1465 | ||
test4.xml | 5696496 | 6287776 | 1,1038 | ||
test5.xml | 11394656 | 12592800 | 1,1051 | ||
test6.xml | 22789264 | 25185552 | 1,1051 | ||
time | |||||
test3.xml | 22873 | 26749 | 1,1695 | ||
test4.xml | 46166 | 54537 | 1,1813 | ||
test5.xml | 93205 | 111312 | 1,1943 | ||
test6.xml | 198988 | 235645 | 1,1842 | ||
Discussion
loading documents from disk
When it comes to loading a file from the disk, we compared xmlReadFile versus gxml_document_new_from_path (which uses xmlParseFile).
Memory usage differences are consistently ~14% higher.
Time-wise, on smaller files, GXml tasks up to 50% longer than using libxml2. I'm not sure why test4.xml is miraculously lower from this run. You can see that the larger the file, smaller the difference, which makes sense, since most of the hardwork is done by libxml2 anyway.
loading documents from memory
With memory, again, we see a consistent increase between ~14-16%.
Time-wise, again GXml oddly performs better on test4.xml. Elsewise, we see the same trend: there is little difference with larger files.
saving to disk
We don't report memory differences because GXml's save functionality cleans up its use of xmlSaveCtxt before it exits, so we can't (easily) see how much we used. Neither leak, so there is nothing to see there.
Time-wise, it seems to take about the same length of time, but GXml may be trending to more. This could be due to tasks like synchronising data that is initially stored just in GXmlNodes and needs to be copied into the xmlDoc of libxml2 to make it to disk.
stringification
Memory-wise, we typically see an increase of ~10-15%. Note that they failed to handle the stringification of the largest file, test7.xml, which requires further investigation. Stringification was done with xmlDocDumpFormatMemory.
Time-wise, the increase was ~16-20%.
Conclusion
Regarding memory usage, if you use GXml for cases such as these, you can expect around a 15% increase in memory usage. That makes sense, as GObjects are used instead of the light C structures libxml2 typically does. One benefit in hwrapping libxml2 is that we don't actually create a GXmlNode for every xmlNode in a document, only the ones we use, so a pure GObject implementation might use more memory.
Regarding time usage, the difference for some operations is small, a couple percent, and for others, the difference is larger with smaller files, as big as 50% when loading a smaller file. Larger files in those cases (such as loading documents) see less and less of a penalty.
I feel as though for many common applications, these don't represent a significant penalty (time taken in loading large documents is still a few dozen milliseconds), and can be worth the benefits in using a GObject API.
Going forward
If you're interested in more about GXml's performance, the test suite will be in gxml/tests/performance/. Feel free to submit new tests and test files.
Regarding GXml, HEAD will be pushed out in a new feature release including the API changes, fancy new features, and contributions from others, including Daniel Espinosa, Adam Ples, Simon Reimer, and others.
Cheerio!
Keine Kommentare:
Kommentar veröffentlichen