<?xml version="1.0" encoding="utf-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments for Intrepid Blog</title>
	<atom:link href="http://blog.affien.com/comments/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.affien.com</link>
	<description>A few thoughts</description>
	<lastBuildDate>Mon, 04 Feb 2013 07:59:50 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4.1</generator>
	<item>
		<title>Comment on msgpack for pypy by Bas Westerbaan</title>
		<link>http://blog.affien.com/archives/2013/01/29/msgpack-for-pypy/comment-page-1/#comment-179135</link>
		<dc:creator>Bas Westerbaan</dc:creator>
		<pubDate>Mon, 04 Feb 2013 07:59:50 +0000</pubDate>
		<guid isPermaLink="false">http://blog.affien.com/?p=535#comment-179135</guid>
		<description>Great!  I am looking forward to it.</description>
		<content:encoded><![CDATA[<p>Great!  I am looking forward to it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on msgpack for pypy by John M. Camara</title>
		<link>http://blog.affien.com/archives/2013/01/29/msgpack-for-pypy/comment-page-1/#comment-179132</link>
		<dc:creator>John M. Camara</dc:creator>
		<pubDate>Mon, 04 Feb 2013 00:56:02 +0000</pubDate>
		<guid isPermaLink="false">http://blog.affien.com/?p=535#comment-179132</guid>
		<description>I plan to use msgpack in a project I&#039;m starting in 3-5 months.  At that point in time I&#039;ll take a look at using cffi and the msgpack library in general.  Since I knew I would be using it in the near future I figured I would give you some feedback while you were working on it.

Any way good luck on the project and I will contribute a patch when I have the time to work on it.</description>
		<content:encoded><![CDATA[<p>I plan to use msgpack in a project I&#8217;m starting in 3-5 months.  At that point in time I&#8217;ll take a look at using cffi and the msgpack library in general.  Since I knew I would be using it in the near future I figured I would give you some feedback while you were working on it.</p>
<p>Any way good luck on the project and I will contribute a patch when I have the time to work on it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on msgpack for pypy by Bas Westerbaan</title>
		<link>http://blog.affien.com/archives/2013/01/29/msgpack-for-pypy/comment-page-1/#comment-179130</link>
		<dc:creator>Bas Westerbaan</dc:creator>
		<pubDate>Sun, 03 Feb 2013 18:16:42 +0000</pubDate>
		<guid isPermaLink="false">http://blog.affien.com/?p=535#comment-179130</guid>
		<description>Under PyPy, &lt;code&gt;_fb_read&lt;/code&gt; does not create an extra string in the most common case.

I just edited my test script to use a preinitialized &lt;code&gt;Struct&lt;/code&gt;.  PyPy does not optimize it as well as calling just &lt;code&gt;struct.unpack&lt;/code&gt;.  That is a shortcoming of PyPy.

I agree that code should be cleanly designed without worrying about a specific implementation.  Only where performance is critical, we should fiddle with toys as the jitviewer.  The performance of msgpack on PyPy is critical for me, hence the effort.

I still am very doubtful that &lt;code&gt;cffi&lt;/code&gt; would increase performance, for the same reasons I gave before.  If you would contribute a &lt;code&gt;cffi&lt;/code&gt; backend to &lt;code&gt;msgpack-python&lt;/code&gt;, that would be great.</description>
		<content:encoded><![CDATA[<p>Under PyPy, <code>_fb_read</code> does not create an extra string in the most common case.</p>
<p>I just edited my test script to use a preinitialized <code>Struct</code>.  PyPy does not optimize it as well as calling just <code>struct.unpack</code>.  That is a shortcoming of PyPy.</p>
<p>I agree that code should be cleanly designed without worrying about a specific implementation.  Only where performance is critical, we should fiddle with toys as the jitviewer.  The performance of msgpack on PyPy is critical for me, hence the effort.</p>
<p>I still am very doubtful that <code>cffi</code> would increase performance, for the same reasons I gave before.  If you would contribute a <code>cffi</code> backend to <code>msgpack-python</code>, that would be great.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on msgpack for pypy by John M. Camara</title>
		<link>http://blog.affien.com/archives/2013/01/29/msgpack-for-pypy/comment-page-1/#comment-179129</link>
		<dc:creator>John M. Camara</dc:creator>
		<pubDate>Sun, 03 Feb 2013 17:20:21 +0000</pubDate>
		<guid isPermaLink="false">http://blog.affien.com/?p=535#comment-179129</guid>
		<description>Ah sorry there was a misunderstanding as I was not concerned about extra strings in the struct.pack and struct.unpack.  The extra strings I was referring to are the ones created for example in the _fb_read method.  Where the return value ret is a new substring created from copy bytes from the original buffer.

The mspack spec was designed to avoid this copying and to avoid type conversions.  Although in Python you can only avoid the copying by using the unpack_from and pack_into methods.  Although maybe PyPy can also avoid the type conversion but I don&#039;t know if it is smart enough to do so at this time.

The only issue related to strings that I had with the struct.pack and struct.unpack was the parsing of the format string.  I just wanted to make sure it was done only once for each msgpack type.  It looks like PyPy is able to clean this up but I don&#039;t feel like its a good idea to code with a poorer design knowing that PyPy is able to clean it up.  You should be able to write good clean Python code that will execute fast on PyPy rather than writing code based on your knowledge of how PyPy is able to make improvements, as what it&#039;s able to optimize will always be a moving target.  Although in this particular case I don&#039;t believe PyPy would make a regression in this area that would remove the optimizations that are helping you at this time.

Also before the loop is JiITed it has to run in the interrupter so the earlier operations of each msgpack type will be slower due to parsing the format string.  Also what about those who may want to use this library in CPython and decide not to use cython.  The performance for them is going to truly suck big time as the format string will have to be parsed every time.  So that&#039;s another reason to convert the code to use struct objects.

We have been discussing how to improve the existing code but I like to just remind you that it&#039;s very likely that the best performance you going to see from PyPy will likely come from using cffi and the native c library.  That&#039;s basically the same approach that the cython solution is relying on.

Also, just to try to make this point a little stronger.  The jitviewer should be mainly used by PyPy core developers and those building PyPy VMs.  A normal developer writing Python code to run on PyPy shouldn&#039;t have a need to use it.  They can use it to point out an inefficiency that PyPy has to the core developers but it should not be used as a way to get you to write Python code in a way that has a better chance of being optimized under PyPy except for very rare occasions and even then it should only be made by those who follow closely and understand PyPy&#039;s development.

You can personally benefit much more from learning about good design and the benefits it provides over learning how to write obscure Python code that runs fast in PyPy.  In general a better design will also run faster in PyPy.</description>
		<content:encoded><![CDATA[<p>Ah sorry there was a misunderstanding as I was not concerned about extra strings in the struct.pack and struct.unpack.  The extra strings I was referring to are the ones created for example in the _fb_read method.  Where the return value ret is a new substring created from copy bytes from the original buffer.</p>
<p>The mspack spec was designed to avoid this copying and to avoid type conversions.  Although in Python you can only avoid the copying by using the unpack_from and pack_into methods.  Although maybe PyPy can also avoid the type conversion but I don&#8217;t know if it is smart enough to do so at this time.</p>
<p>The only issue related to strings that I had with the struct.pack and struct.unpack was the parsing of the format string.  I just wanted to make sure it was done only once for each msgpack type.  It looks like PyPy is able to clean this up but I don&#8217;t feel like its a good idea to code with a poorer design knowing that PyPy is able to clean it up.  You should be able to write good clean Python code that will execute fast on PyPy rather than writing code based on your knowledge of how PyPy is able to make improvements, as what it&#8217;s able to optimize will always be a moving target.  Although in this particular case I don&#8217;t believe PyPy would make a regression in this area that would remove the optimizations that are helping you at this time.</p>
<p>Also before the loop is JiITed it has to run in the interrupter so the earlier operations of each msgpack type will be slower due to parsing the format string.  Also what about those who may want to use this library in CPython and decide not to use cython.  The performance for them is going to truly suck big time as the format string will have to be parsed every time.  So that&#8217;s another reason to convert the code to use struct objects.</p>
<p>We have been discussing how to improve the existing code but I like to just remind you that it&#8217;s very likely that the best performance you going to see from PyPy will likely come from using cffi and the native c library.  That&#8217;s basically the same approach that the cython solution is relying on.</p>
<p>Also, just to try to make this point a little stronger.  The jitviewer should be mainly used by PyPy core developers and those building PyPy VMs.  A normal developer writing Python code to run on PyPy shouldn&#8217;t have a need to use it.  They can use it to point out an inefficiency that PyPy has to the core developers but it should not be used as a way to get you to write Python code in a way that has a better chance of being optimized under PyPy except for very rare occasions and even then it should only be made by those who follow closely and understand PyPy&#8217;s development.</p>
<p>You can personally benefit much more from learning about good design and the benefits it provides over learning how to write obscure Python code that runs fast in PyPy.  In general a better design will also run faster in PyPy.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on msgpack for pypy by Bas Westerbaan</title>
		<link>http://blog.affien.com/archives/2013/01/29/msgpack-for-pypy/comment-page-1/#comment-179128</link>
		<dc:creator>Bas Westerbaan</dc:creator>
		<pubDate>Sun, 03 Feb 2013 11:49:26 +0000</pubDate>
		<guid isPermaLink="false">http://blog.affien.com/?p=535#comment-179128</guid>
		<description>&lt;a href=&quot;https://gist.github.com/4701471&quot; rel=&quot;nofollow&quot;&gt;this&lt;/a&gt; was the test file.  I used a file &quot;test&quot; with 1MB of bogus data. &lt;a href=&quot;https://gist.github.com/4701475&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt; is the log of PyPy you can plug into jitviewer.</description>
		<content:encoded><![CDATA[<p><a href="https://gist.github.com/4701471" rel="nofollow">this</a> was the test file.  I used a file &#8220;test&#8221; with 1MB of bogus data. <a href="https://gist.github.com/4701475" rel="nofollow">here</a> is the log of PyPy you can plug into jitviewer.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on msgpack for pypy by Bas Westerbaan</title>
		<link>http://blog.affien.com/archives/2013/01/29/msgpack-for-pypy/comment-page-1/#comment-179127</link>
		<dc:creator>Bas Westerbaan</dc:creator>
		<pubDate>Sun, 03 Feb 2013 11:45:55 +0000</pubDate>
		<guid isPermaLink="false">http://blog.affien.com/?p=535#comment-179127</guid>
		<description>Your assumption is that something like &lt;code&gt;struct.unpack(&quot;I&quot;, s[4*i:4*i+4])&lt;/code&gt; requires a function call and string copying.  I think this is not the case in PyPy.  To test this, I created a simple script and examined the generated assembly in jitviewer.  &lt;a href=&quot;http://w-nz.com/~bas/images/jitviewer1.png&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt; you can see that it has actually been inlined.

So, for PyPy my approach seems optimal.  However, for plain CPython your approach could certainly be superior to mine.</description>
		<content:encoded><![CDATA[<p>Your assumption is that something like <code>struct.unpack("I", s[4*i:4*i+4])</code> requires a function call and string copying.  I think this is not the case in PyPy.  To test this, I created a simple script and examined the generated assembly in jitviewer.  <a href="http://w-nz.com/~bas/images/jitviewer1.png" rel="nofollow">here</a> you can see that it has actually been inlined.</p>
<p>So, for PyPy my approach seems optimal.  However, for plain CPython your approach could certainly be superior to mine.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on msgpack for pypy by John M. Camara</title>
		<link>http://blog.affien.com/archives/2013/01/29/msgpack-for-pypy/comment-page-1/#comment-179114</link>
		<dc:creator>John M. Camara</dc:creator>
		<pubDate>Fri, 01 Feb 2013 22:05:03 +0000</pubDate>
		<guid isPermaLink="false">http://blog.affien.com/?p=535#comment-179114</guid>
		<description>I was going to suggest that you include __slots__ on the classes that create or are used in the pack/unpack process.  But since your only creating basic types this is not an issue.  It was on purpose that I ended up not mentioning them in my previous comment as when some people learn about __slots__ they start adding them to most of the classes they create which is not a good idea.  They are mainly good for class that just mainly hold small amounts of data and in situations where you need to create a large number of them.  When __slots__ are used appropriately they reduce the memory overhead and increase attribute access performance under CPython.  But using them comes at a cost of making your code less flexible.  BTW, under PyPy the optimizations provided by __slots__ are automatic so adding them to increase performance or decrease memory size will not occur under PyPy as you get these savings for free.

It turns out performance issues you have are due to the algorithms you are using.  You really need to change the architecture to see an real big performance gains.  That&#039;s why I suggested you don&#039;t create all these sub strings from your buffer and instead keep track of your position in the message as you process the message and use the unpack_from and pack_into methods.

Not creating all the sub strings will reduce the memory overhead as well as reduce the pressure on the garbage collector while also giving you a significant boost in performance.

Now I&#039;m not sure how much of the overhead created by all your if and elif statements PyPy  are able to remove but I know implementing a more efficient algorithm that gets rid of all the if and elif will be even faster.

I haven&#039;t seen the output of jitviewer for this code.  As a matter of fact I haven&#039;t even run the code.  By looking at the code I can see that its design could be improved in a number of areas. I also see the code is written in the style of C code, which in general, Python code written like C will have poor performance issues as the run time characteristics of C and Python code are quite different.  For example, function calls in C have very little overhead compare to the overhead they have in Python.  So in C where calling out to lots of functions in a tight loop does not have much of an impact it does so in CPython.

When I come across code that is slow I generally do the following
* Perform a quick code review to see is there are any obvious issues
* Fix the obvious issues
* If it still too slow I run a profiler.  In Python I use cProfiler and look at the top few big hitters or view the output of the profiler using a tool call runsnakerun which gives you a great way to visualize the profile data and helps you answer quickly many questions you may have while looking at the profile data.
* I then make updates to the code based on the profiller results and then repeat the profilling and code changes as necessary.
* Next for a PyPy project and only in rare cases I would look at the jitviewer if I still needed more performance.

The jitviewer should only be used in the **rare** cases where you need that little extra performance and only after you know you have already implemented good algorithms and have reviewed the profiller results.

From looking at the code I would say your nowhere near the point where you should consider using the jitviewer.  Of course this is just my opinion.

Hopefully you find this useful.</description>
		<content:encoded><![CDATA[<p>I was going to suggest that you include __slots__ on the classes that create or are used in the pack/unpack process.  But since your only creating basic types this is not an issue.  It was on purpose that I ended up not mentioning them in my previous comment as when some people learn about __slots__ they start adding them to most of the classes they create which is not a good idea.  They are mainly good for class that just mainly hold small amounts of data and in situations where you need to create a large number of them.  When __slots__ are used appropriately they reduce the memory overhead and increase attribute access performance under CPython.  But using them comes at a cost of making your code less flexible.  BTW, under PyPy the optimizations provided by __slots__ are automatic so adding them to increase performance or decrease memory size will not occur under PyPy as you get these savings for free.</p>
<p>It turns out performance issues you have are due to the algorithms you are using.  You really need to change the architecture to see an real big performance gains.  That&#8217;s why I suggested you don&#8217;t create all these sub strings from your buffer and instead keep track of your position in the message as you process the message and use the unpack_from and pack_into methods.</p>
<p>Not creating all the sub strings will reduce the memory overhead as well as reduce the pressure on the garbage collector while also giving you a significant boost in performance.</p>
<p>Now I&#8217;m not sure how much of the overhead created by all your if and elif statements PyPy  are able to remove but I know implementing a more efficient algorithm that gets rid of all the if and elif will be even faster.</p>
<p>I haven&#8217;t seen the output of jitviewer for this code.  As a matter of fact I haven&#8217;t even run the code.  By looking at the code I can see that its design could be improved in a number of areas. I also see the code is written in the style of C code, which in general, Python code written like C will have poor performance issues as the run time characteristics of C and Python code are quite different.  For example, function calls in C have very little overhead compare to the overhead they have in Python.  So in C where calling out to lots of functions in a tight loop does not have much of an impact it does so in CPython.</p>
<p>When I come across code that is slow I generally do the following<br />
* Perform a quick code review to see is there are any obvious issues<br />
* Fix the obvious issues<br />
* If it still too slow I run a profiler.  In Python I use cProfiler and look at the top few big hitters or view the output of the profiler using a tool call runsnakerun which gives you a great way to visualize the profile data and helps you answer quickly many questions you may have while looking at the profile data.<br />
* I then make updates to the code based on the profiller results and then repeat the profilling and code changes as necessary.<br />
* Next for a PyPy project and only in rare cases I would look at the jitviewer if I still needed more performance.</p>
<p>The jitviewer should only be used in the **rare** cases where you need that little extra performance and only after you know you have already implemented good algorithms and have reviewed the profiller results.</p>
<p>From looking at the code I would say your nowhere near the point where you should consider using the jitviewer.  Of course this is just my opinion.</p>
<p>Hopefully you find this useful.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on msgpack for pypy by Bas Westerbaan</title>
		<link>http://blog.affien.com/archives/2013/01/29/msgpack-for-pypy/comment-page-1/#comment-179113</link>
		<dc:creator>Bas Westerbaan</dc:creator>
		<pubDate>Fri, 01 Feb 2013 20:29:21 +0000</pubDate>
		<guid isPermaLink="false">http://blog.affien.com/?p=535#comment-179113</guid>
		<description>PyPy (&gt;= 2.0) optimizes pypy.unpack with constant format strings out: so initializing separate Struct objects won&#039;t increase performance for PyPy.  Have you played with jitviewer?

What was your idea for minimizing memory usage on CPython?</description>
		<content:encoded><![CDATA[<p>PyPy (>= 2.0) optimizes pypy.unpack with constant format strings out: so initializing separate Struct objects won&#8217;t increase performance for PyPy.  Have you played with jitviewer?</p>
<p>What was your idea for minimizing memory usage on CPython?</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on msgpack for pypy by John M. Camara</title>
		<link>http://blog.affien.com/archives/2013/01/29/msgpack-for-pypy/comment-page-1/#comment-179110</link>
		<dc:creator>John M. Camara</dc:creator>
		<pubDate>Fri, 01 Feb 2013 02:08:17 +0000</pubDate>
		<guid isPermaLink="false">http://blog.affien.com/?p=535#comment-179110</guid>
		<description># define a struct for each type of msgpack type
float_struct = struct.Struct(“&gt;f”)
float_struct = struct.Struct(“&gt;d”)
…

should have been

# define a struct for each type of msgpack type
float_struct = struct.Struct(“&gt;f”)
double_struct = struct.Struct(“&gt;d”)
…

Change was s/float_struct/double_struct/</description>
		<content:encoded><![CDATA[<p># define a struct for each type of msgpack type<br />
float_struct = struct.Struct(“&gt;f”)<br />
float_struct = struct.Struct(“&gt;d”)<br />
…</p>
<p>should have been</p>
<p># define a struct for each type of msgpack type<br />
float_struct = struct.Struct(“&gt;f”)<br />
double_struct = struct.Struct(“&gt;d”)<br />
…</p>
<p>Change was s/float_struct/double_struct/</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on msgpack for pypy by John M. Camara</title>
		<link>http://blog.affien.com/archives/2013/01/29/msgpack-for-pypy/comment-page-1/#comment-179109</link>
		<dc:creator>John M. Camara</dc:creator>
		<pubDate>Fri, 01 Feb 2013 01:54:52 +0000</pubDate>
		<guid isPermaLink="false">http://blog.affien.com/?p=535#comment-179109</guid>
		<description>I was going to offer a suggestion that may help reduce the memory foot print under CPython but I decided to take a peek at the code first before I suggested it.

By looking at the code there appears to be many opportunities to make it faster.  So here are a few suggestions.

There are a couple of ways in which you can use the struct module.  You can use the functions or the Struct class.  When you use the functions it always has to parse the format strings where as if you create a struct object the format is parsed just once and you can call its methods as often as you like.

so instead of doing this inside a loop
struct.unpack(&quot;&gt;f&quot;, some_string)

do something like

# outside the loop
float_struct = struct.Struct(&quot;&gt;f&quot;)

# inside the loop
float_struct.unpact(some_string)

The idea is to create one struct object for each type of struct you need to deal with.

Another area to increase performance is to reduce all the string copying that is going on.  The msgpack spec was designed in a way to reduce the need of copying data around.  Unfortunately Python can&#039;t take full advantage of this like you can in say C in which you can do pointer arrhythmic and a cast.  Although maybe PyPy has something to help in this regard although I don&#039;t know off the top of my head.  What you can do in Python at least is keep track of the position in the message as you process it and use the pack_into and unpack_from methods provided by the struct module.  These methods take a buffer and an offset which is your current position.

By making this change you will also eliminate the need to use PyPy&#039;s StringBuilder.

Another area for improvement is to eliminate the heavy use of if and elif statements such as the ones used in the _fb_unpack method.  Since the types in msgpack can be determined by reading one byte I would allocate a list of 256 tuples that would be used to aid in the unpacking process.  Each tuple would contain two values.  The first would be a reference to the struct object used to pack/unpack and the second for TYPE_IMMEDIATE, TYPE_RAW, TYPE_ARRAY, etc.  So you would do something like this to set up the list

# define a struct for each type of msgpack type
float_struct = struct.Struct(&quot;&gt;f&quot;)
float_struct = struct.Struct(&quot;&gt;d&quot;)
...

# create a list to hold 256 tuples of unpack operations
unpack_operations = [None]*256

# create the tuples for the operations
unpack_operations[0x00] = ...
...
unpack_operations[0xca] = (float_struct, TYPE_IMMEDIATE)
unpack_operations[0xcb] = (double_struct, TYPE_IMMEDIATE)
...
unpack_operations[0xff] = ...

For types such as Positive FixNum, FixMap, FixArray, FixRaw, Negative FixNum you could set up one tuple for each one of them and use a for loop to add the reference to these tuple to the list.  You can also do the same thing for the reserved types.

Then inside the _fb_unpack message you could do something like

struct_object, typ = unpack_operations[b]
struct_object.unpack_from(buffer, position)

# then do processing for the typ

That way you can eliminate all those if elif statements and increase the speed of the code as a lot less work would need to be done.

Another further improvement would be code up each of the typ operations as new private methods and instead of the 2nd item in the tuple being TYPE_IMMEDIATE, TYPE_RAW, TYPE_ARRAY, etc it would instead have a reference to these additional methods.

I suspect there are quit a few additional improvements that could be made but this should be a good starting point.  I didn&#039;t really take a good look at all the code so these suggestions were just from a quick look.  Of course these suggestions also apply to the pack side.</description>
		<content:encoded><![CDATA[<p>I was going to offer a suggestion that may help reduce the memory foot print under CPython but I decided to take a peek at the code first before I suggested it.</p>
<p>By looking at the code there appears to be many opportunities to make it faster.  So here are a few suggestions.</p>
<p>There are a couple of ways in which you can use the struct module.  You can use the functions or the Struct class.  When you use the functions it always has to parse the format strings where as if you create a struct object the format is parsed just once and you can call its methods as often as you like.</p>
<p>so instead of doing this inside a loop<br />
struct.unpack(&#8220;&gt;f&#8221;, some_string)</p>
<p>do something like</p>
<p># outside the loop<br />
float_struct = struct.Struct(&#8220;&gt;f&#8221;)</p>
<p># inside the loop<br />
float_struct.unpact(some_string)</p>
<p>The idea is to create one struct object for each type of struct you need to deal with.</p>
<p>Another area to increase performance is to reduce all the string copying that is going on.  The msgpack spec was designed in a way to reduce the need of copying data around.  Unfortunately Python can&#8217;t take full advantage of this like you can in say C in which you can do pointer arrhythmic and a cast.  Although maybe PyPy has something to help in this regard although I don&#8217;t know off the top of my head.  What you can do in Python at least is keep track of the position in the message as you process it and use the pack_into and unpack_from methods provided by the struct module.  These methods take a buffer and an offset which is your current position.</p>
<p>By making this change you will also eliminate the need to use PyPy&#8217;s StringBuilder.</p>
<p>Another area for improvement is to eliminate the heavy use of if and elif statements such as the ones used in the _fb_unpack method.  Since the types in msgpack can be determined by reading one byte I would allocate a list of 256 tuples that would be used to aid in the unpacking process.  Each tuple would contain two values.  The first would be a reference to the struct object used to pack/unpack and the second for TYPE_IMMEDIATE, TYPE_RAW, TYPE_ARRAY, etc.  So you would do something like this to set up the list</p>
<p># define a struct for each type of msgpack type<br />
float_struct = struct.Struct(&#8220;&gt;f&#8221;)<br />
float_struct = struct.Struct(&#8220;&gt;d&#8221;)<br />
&#8230;</p>
<p># create a list to hold 256 tuples of unpack operations<br />
unpack_operations = [None]*256</p>
<p># create the tuples for the operations<br />
unpack_operations[0x00] = &#8230;<br />
&#8230;<br />
unpack_operations[0xca] = (float_struct, TYPE_IMMEDIATE)<br />
unpack_operations[0xcb] = (double_struct, TYPE_IMMEDIATE)<br />
&#8230;<br />
unpack_operations[0xff] = &#8230;</p>
<p>For types such as Positive FixNum, FixMap, FixArray, FixRaw, Negative FixNum you could set up one tuple for each one of them and use a for loop to add the reference to these tuple to the list.  You can also do the same thing for the reserved types.</p>
<p>Then inside the _fb_unpack message you could do something like</p>
<p>struct_object, typ = unpack_operations[b]<br />
struct_object.unpack_from(buffer, position)</p>
<p># then do processing for the typ</p>
<p>That way you can eliminate all those if elif statements and increase the speed of the code as a lot less work would need to be done.</p>
<p>Another further improvement would be code up each of the typ operations as new private methods and instead of the 2nd item in the tuple being TYPE_IMMEDIATE, TYPE_RAW, TYPE_ARRAY, etc it would instead have a reference to these additional methods.</p>
<p>I suspect there are quit a few additional improvements that could be made but this should be a good starting point.  I didn&#8217;t really take a good look at all the code so these suggestions were just from a quick look.  Of course these suggestions also apply to the pack side.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
