<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Intrepid Blog &#187; benchmark</title>
	<atom:link href="http://blog.affien.com/archives/tag/benchmark/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.affien.com</link>
	<description>A few thoughts</description>
	<lastBuildDate>Mon, 23 Jan 2012 08:47:37 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>git&#8216;s versus svn&#8216;s storage efficiency</title>
		<link>http://blog.affien.com/archives/2008/07/08/gits-versus-svns-storage-efficiency/</link>
		<comments>http://blog.affien.com/archives/2008/07/08/gits-versus-svns-storage-efficiency/#comments</comments>
		<pubDate>Tue, 08 Jul 2008 00:43:47 +0000</pubDate>
		<dc:creator>Bas Westerbaan</dc:creator>
				<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[benchmark]]></category>
		<category><![CDATA[codeyard]]></category>
		<category><![CDATA[cyv]]></category>
		<category><![CDATA[git]]></category>
		<category><![CDATA[storage]]></category>
		<category><![CDATA[svn]]></category>

		<guid isPermaLink="false">http://blog.w-nz.com/?p=319</guid>
		<description><![CDATA[At Codeyard we maintain a git and a subversion reposito [...]]]></description>
			<content:encoded><![CDATA[<p>At <a href="http://codeyard.net/">Codeyard</a> we maintain a git and a subversion repository (which <a href="http://blog.w-nz.com/archives/2008/05/26/cyv-syncing-git-and-svn/">are synced</a> with each other) for each of the >115 projects. The following graph shows the repositories plotted logarithmically according to the size of their whole server side subversion repository horizontally and their git repository size vertically:<br />
<a href='http://blog.w-nz.com/wp-content/uploads/2008/07/git-vs-svn.png'><img src="http://blog.w-nz.com/wp-content/uploads/2008/07/git-vs-svn.png" alt="" title="Git versus SVN storage efficiency" width="428" height="285" class="aligncenter size-full wp-image-320" /></a></p>
<p>To make more sense of the logarithmic nature of the graph, I&#8217;ve added three lines.  The first (solid black) indicates the points of which both sizes are equal.  The second course dashed line indicates the points of which the subversion repository is twice as large as the git repository.  And lastly, the third finely dashed line indicates the points of which the subversion repository is five times as large as the git repository.</p>
<p>All projects for which git is less storage efficient, are smaller than 100Kb.  The projects for which git is most storage efficient (up to even 6 times for a certain C# project), are all of medium size (10&#8211;100MB) and code-heavy.  For the other projects, which are blob heavy (eg. images), git and subversion are close (git beats svn by ~20%).</p>
<p>One notable disadvantage of <em>huge</em> (someone committed a livecd image) git repositories, is an apparent <img src='/wp-latexrender/pictures/f3bca1459595f95e1a28e1176fe9343f.png' title='\geq2N' alt='\geq2N' align=absmiddle> memory usage of <code>git repack</code> even if I tell it not to with <code>--window-memory</code>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.affien.com/archives/2008/07/08/gits-versus-svns-storage-efficiency/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Benchmarking CouchDB (2)</title>
		<link>http://blog.affien.com/archives/2008/04/15/benchmarking-couchdb-2/</link>
		<comments>http://blog.affien.com/archives/2008/04/15/benchmarking-couchdb-2/#comments</comments>
		<pubDate>Tue, 15 Apr 2008 12:59:42 +0000</pubDate>
		<dc:creator>Bas Westerbaan</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[benchmark]]></category>
		<category><![CDATA[couchdb]]></category>

		<guid isPermaLink="false">http://blog.w-nz.com/?p=310</guid>
		<description><![CDATA[

This is a plot of the amount of documents created i [...]]]></description>
			<content:encoded><![CDATA[<p><a href='http://blog.w-nz.com/wp-content/uploads/2008/04/couchdb-create-v-per-b.png'><img src="http://blog.w-nz.com/wp-content/uploads/2008/04/couchdb-create-v-per-b.png" alt="" title="couchdb-create-v-per-b" width="400" height="250" class="aligncenter size-full wp-image-311" /></a></p>
<p>This is a plot of the amount of documents created in a bulk update at the same time against the average amount of documents created per second it yields.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.affien.com/archives/2008/04/15/benchmarking-couchdb-2/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Benchmarking CouchDB (1)</title>
		<link>http://blog.affien.com/archives/2008/04/02/benchmarking-couchdb-1/</link>
		<comments>http://blog.affien.com/archives/2008/04/02/benchmarking-couchdb-1/#comments</comments>
		<pubDate>Wed, 02 Apr 2008 12:48:31 +0000</pubDate>
		<dc:creator>Bas Westerbaan</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[benchmark]]></category>
		<category><![CDATA[couchdb]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://blog.w-nz.com/?p=305</guid>
		<description><![CDATA[I've written a small benchmark for couchdb to test it's [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve written a small benchmark for couchdb to test it&#8217;s document creation performance.  A script creates <img src='/wp-latexrender/pictures/f9c4988898e7f532b9f826a75014ed3c.png' title='$N$' alt='$N$' align=absmiddle> documents in total using bulk update to create <img src='/wp-latexrender/pictures/61e84f854bc6258d4108d08d4c4a0852.png' title='$B$' alt='$B$' align=absmiddle> at the same time with <img src='/wp-latexrender/pictures/2f118ee06d05f3c2d98361d9c30e38ce.png' title='$T$' alt='$T$' align=absmiddle> concurrent threads. The following graph show the time it takes to create an amount of documents against that amount of document for different values of <img src='/wp-latexrender/pictures/61e84f854bc6258d4108d08d4c4a0852.png' title='$B$' alt='$B$' align=absmiddle> with <img src='/wp-latexrender/pictures/804c70e2e7d042c95b3a2995ad967adf.png' title='$T=1$' alt='$T=1$' align=absmiddle>.</p>
<p><a href='http://blog.w-nz.com/wp-content/uploads/2008/04/couchdb-create-t1.png'><img src="http://blog.w-nz.com/wp-content/uploads/2008/04/couchdb-create-t1.png" alt="" title="couchdb-create-t1" width="428" height="278" class="aligncenter size-full wp-image-306" /></a></p>
<p>And for <img src='/wp-latexrender/pictures/2cc82c99a866ae9891a91659fe42a57a.png' title='$T=2' alt='$T=2' align=absmiddle> (two concurrent threads.  Tested on a dual core machine)<br />
<a href='http://blog.w-nz.com/wp-content/uploads/2008/04/couchdb-create-t2.png'><img src="http://blog.w-nz.com/wp-content/uploads/2008/04/couchdb-create-t2.png" alt="" title="couchdb-create-t2" width="428" height="278" class="aligncenter size-full wp-image-307" /></a><br />
<small>The values of B are 1, 2, 4, 5, 8, 11, 16, 22, 32, 45, 64, 90, 128, 181, 256, 362, 512, 724 and 1024</small></p>
<p>As you can see, a higher value of <img src='/wp-latexrender/pictures/61e84f854bc6258d4108d08d4c4a0852.png' title='$B$' alt='$B$' align=absmiddle> causes the graph to shift to the right which means more <img src='/wp-latexrender/pictures/e25ec8b0af895735d0fe10be2ae08fc9.png' title='$N' alt='$N' align=absmiddle> for the same time.  Bulk update really does make a difference.  <em>Or</em> non-bulk-update really sucks.  Also adding threads does help a bit, but not as much as expected.</p>
<p>There are some more interesting graphs to plot (<img src='/wp-latexrender/pictures/61e84f854bc6258d4108d08d4c4a0852.png' title='$B$' alt='$B$' align=absmiddle> against <img src='/wp-latexrender/pictures/c412ce2392f725bea19da9b3385e1b4b.png' title='$\overline {N \over \Delta T} $' alt='$\overline {N \over \Delta T} $' align=absmiddle>).  More graphs tomorrow.</p>
<p>(For those interested, the <a href='http://blog.w-nz.com/wp-content/uploads/2008/04/results'>raw data</a> from which these graphs were plotted.)</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.affien.com/archives/2008/04/02/benchmarking-couchdb-1/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>CouchDB document creation performance</title>
		<link>http://blog.affien.com/archives/2008/03/30/couchdb-document-creation-performance/</link>
		<comments>http://blog.affien.com/archives/2008/03/30/couchdb-document-creation-performance/#comments</comments>
		<pubDate>Sat, 29 Mar 2008 23:24:10 +0000</pubDate>
		<dc:creator>Bas Westerbaan</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[benchmark]]></category>
		<category><![CDATA[couchdb]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://blog.w-nz.com/?p=302</guid>
		<description><![CDATA[CouchDB is a non-relational database which uses MapRedu [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://couchdb.org">CouchDB</a> is a non-relational database which uses <a href="http://labs.google.com/papers/mapreduce.html">MapReduce</a> inspired views to query data.  There are lots of cool things to tell about its design, but I rather want to talk about its performance.</p>
<p>Today I&#8217;ve been busy hacking together a little script to import all e-mails of a long e-mail thread into a couchdb database to write views to extract all kinds of statistics.  I already imported these e-mails into a MySQL database a few months ago, but was quite disappointed by the (performance) limitations of SQL.  The e-mail thread contains over 20,000 messages which weren&#8217;t a real problem for MySQL.  When importing, however, couchdb was adding them at a rate of only a few dozen per second with a lot of (seek)noise of my HDD.</p>
<p>So I decided to do a simple benchmark.  First of, a simple script (<a href='http://blog.w-nz.com/wp-content/uploads/2008/03/ser.py'>ser.py</a>) that adds empty documents sequentially.  It&#8217;s averaging 16 per second.  It occurred to me that couchdb waits for a <code>fsync</code> before sending a response and that asynchronously the performance would be way better. A simple modification to the script later (<a href='http://blog.w-nz.com/wp-content/uploads/2008/03/par.py'>par.py</a>) it still averaged 16 creations per second. </p>
<p>I guess, for I haven&#8217;t yet figured out how to let <code>strace</code>s tell me, that it&#8217;s the <code>fsync</code> after each object creation which causes the mess.  couchdb itself doesn&#8217;t write or seek a lot, but my <em>journaling</em> filesystem (XFS) does on a <code>fsync</code>.</p>
<p>Can anyone test it on a different filesystem?</p>
<p><ins>Update</ins> Around 17/sec with <code>reiserfs</code>. </p>
<p><ins>Update</ins> I had some trouble with the bulk update feature.  I switched from svn to the 0.7.2 release.  I got about 600/sec, which dropped to a steady-ish 350/sec when using sequential bulkupdates of 100 docs. Two bulk updates in parallel yield about 950/sec initially, dropping to 550/sec after a while.   Three parallel updates yield similar performance.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.affien.com/archives/2008/03/30/couchdb-document-creation-performance/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

