.plan: September 2013

The Google Summer of Code has ended, and GXml is spoiled with the fruits of labour:

the autotools build system has improved

documentation is more complete and more accurate

many new examples across most classes, especially for C and JavaScript

many bugs were flushed out and fixed (e.g. attribute syncing between underlying libxml2 xmlNodes and GXmlElements)

it has a mailing list (gxml-list@gnome.org)

new stuff

document child management, node cloning

new memory tests

new error handling model

new memory handling model (fixing leaks and improving performance!)

improved API compliance

bug-fix release (0.3.2) without API breaks

imminent 0.4.0 with API breaks (pending some updated patches for XPath, Serialization, etc)

I've talked about those before (near the start and while at GUADEC) so for my report I'm going to focus on the outcome in terms of performance.

Look forward to 0.4.0 imminently, and happy hacking.

GXml's performance versus pure libxml2

One question people have had is the difference in performance between libxml2 and GXml, since GXml currently wraps it. Things should be worse, as there's typically more code for each operation, but how large will the penalty be and will it matter for you?

Tests

I created a simple test suite with the four following tasks:

loading a file from disk

loading a file from memory

stringifying a document

saving a document to disk

The test suite is highly modular, and it's easy to add new tests. For
each test, you define a setup function, a test function (the measured
test), and a cleanup function. So if you'd like to see anything else in particular tested, let me know.

Environment

I've run it on a Lenovo ThinkPad Twist S230u with the following configuration

Intel® Core™ i5-3317U CPU @ 1.70GHz × 4

4GB RAM, SODIMM DDR3 Synchronous 1333 MHz (0,8 ns)

500GB HD @ 5400 RPM (HGST HTS725050A7)

/home, including test files

24GB SSD (Samsung MZMPA024)

everything outside of /home, including libraries

Fedora 19, x86_64

libxml2-2.9.1-1.fc19

GXml from git HEAD

Test Data

The test data was based on my updateinfo.xml files from yum, in particular the one found at: /var/cache/yum/x86_64/19/updates/gen/updateinfo.xml. It contained 98743 different nodes over 11,136kB. I created smaller and larger versions of it, resulting in

name	nodes	size (kB)
test3.xml	22 276	2 784
test4.xml	47 707	5 568
test5.xml	98 743	11 136
test6.xml	197 484	22 268
test7.xml	394 966	44 536

This testing could be improved by using diffferent types of files with different types of data. Flatter ones versus deeper ones, for instance. The different sizes were done by either duplicating the content within the root element or by deleting the second half of nodes inside the root element. test5.xml represents the original updateinfo.xml

Measurements

Three values were measured. One was time taken to complete a task (like load a file), using g_get_monotonic_time, which reports in microseconds. One was memory used by the task after it completed, using mallinfo, in particular the uordblks field (total allocated space), and one was memory leaks (also using mallinfo, after we freed memory).

Procedure

I ran the tests once averaged over 10 trials for each combination of test and file, and then again over 25 trials. Ways the procedure could be improved includes better isolation on the system from other processes, or providing more detail than the averaged scores, so we can detect any exceptional anomalies (e.g. some other process causes a file load to be delayed by hogging I/O).

Results

Keep in mind that GXml wraps libxml2 for most functionality, so we
don't expect it to be faster than libxml2, rather we want to see what
penalty a GObject wrapper (written in Vala) causes.

Memory Leaks

GXml was leaking memory like a sieve before the summer. (0.3.2 includes memory leak fixes without the API breaks!), so I wanted to know what memory was left after these tasks from both libxml2 and GXml. Luckily, neither had any in the cases tested. (That does not mean there aren't any! Kudos to those who find them (and more to do who patch them)).

Results

data	libxml2	gxml	diff
load disk
memory
test3.xml	20814019	23667584	1,1371
test4.xml	42604277	48477152	1,1378
test5.xml	86151738	98065217	1,1383
test6.xml	172261657	196126066	1,1385
test7.xml	344483559	392241280	1,1386
time
test3.xml	37547	56513	1,5051
test4.xml	66747	63797	0,9558
test5.xml	144234	161024	1,1164
test6.xml	284488	287911	1,0120
test7.xml	561406	564904	1,0062
load mem
memory
test3.xml	24988568	28866015	1,1552
test4.xml	51434229	59523841	1,1573
test5.xml	104192043	120665588	1,1581
test6.xml	208356730	241330737	1,1583
test7.xml	343791009	391564027	1,1390
time
test3.xml	44199	53860	1,2186
test4.xml	84215	71695	0,8513
test5.xml	172920	184735	1,0683
test6.xml	347157	359909	1,0367
test7.xml	572627	555519	0,9701
save
time
test3.xml	25610	24513	0,9572
test4.xml	52908	49175	0,9294
test5.xml	96449	98308	1,0193
test6.xml	192197	196295	1,0213
test7.xml	384343	395194	1,0282
stringify
memory
test3.xml	2735339	3136192	1,1465
test4.xml	5696496	6287776	1,1038
test5.xml	11394656	12592800	1,1051
test6.xml	22789264	25185552	1,1051
time
test3.xml	22873	26749	1,1695
test4.xml	46166	54537	1,1813
test5.xml	93205	111312	1,1943
test6.xml	198988	235645	1,1842

Discussion

loading documents from disk

When it comes to loading a file from the disk, we compared xmlReadFile versus gxml_document_new_from_path (which uses xmlParseFile).

Memory usage differences are consistently ~14% higher.

Time-wise, on smaller files, GXml tasks up to 50% longer than using libxml2. I'm not sure why test4.xml is miraculously lower from this run. You can see that the larger the file, smaller the difference, which makes sense, since most of the hardwork is done by libxml2 anyway.

loading documents from memory

With memory, again, we see a consistent increase between ~14-16%.

Time-wise, again GXml oddly performs better on test4.xml. Elsewise, we see the same trend: there is little difference with larger files.

saving to disk

We don't report memory differences because GXml's save functionality cleans up its use of xmlSaveCtxt before it exits, so we can't (easily) see how much we used. Neither leak, so there is nothing to see there.

Time-wise, it seems to take about the same length of time, but GXml may be trending to more. This could be due to tasks like synchronising data that is initially stored just in GXmlNodes and needs to be copied into the xmlDoc of libxml2 to make it to disk.

stringification

Memory-wise, we typically see an increase of ~10-15%. Note that they failed to handle the stringification of the largest file, test7.xml, which requires further investigation. Stringification was done with xmlDocDumpFormatMemory.

Time-wise, the increase was ~16-20%.

Conclusion

Regarding memory usage, if you use GXml for cases such as these, you can expect around a 15% increase in memory usage. That makes sense, as GObjects are used instead of the light C structures libxml2 typically does. One benefit in hwrapping libxml2 is that we don't actually create a GXmlNode for every xmlNode in a document, only the ones we use, so a pure GObject implementation might use more memory.

Regarding time usage, the difference for some operations is small, a couple percent, and for others, the difference is larger with smaller files, as big as 50% when loading a smaller file. Larger files in those cases (such as loading documents) see less and less of a penalty.

I feel as though for many common applications, these don't represent a significant penalty (time taken in loading large documents is still a few dozen milliseconds), and can be worth the benefits in using a GObject API.

Going forward

If you're interested in more about GXml's performance, the test suite will be in gxml/tests/performance/. Feel free to submit new tests and test files.

Regarding GXml, HEAD will be pushed out in a new feature release including the API changes, fancy new features, and contributions from others, including Daniel Espinosa, Adam Ples, Simon Reimer, and others.

Cheerio!

.plan

2013-09-30

[Technology] Vala features

2013-09-25

[GNOME] Final Report for GXml in the 2013 Google Summer of Code

GXml's performance versus pure libxml2

Tests

Environment

Test Data

Measurements

Procedure

Results

Memory Leaks

Results

Discussion

Conclusion

Going forward

2013-09-17

[Technology] Mozilla removing certificates

Blog Archive

Dieses Blog durchsuchen

Labels

Missbrauch melden