When I slowly build a big string out of little bits, the worst thing to do in most languages is to just use string concatenation:
str .= little_bit;
Why? Everytime a little bit is added to
str, there must be a new string allocated big enough to contain
str and the new little bit. Then the actual
str must be copied in. In most languages there are constructs to efficiently build strings like this instead of concatenating. StringBuffer in C#. StringIO in Python.
But no, PHP has to be stupid. There is no nice construct and you’ll end up using concatenation. So, I thought to be smart and make use of PHP array’s and
implode. Arrays are here for having elements added and removed all the time so they are properly buffered and should be great at having lots of small elements added. And when I want to pack it all into one big string, I can use PHP’s builtin
I wanted to try it out and created two scripts:
a.php concats a little (10byte) string one million times and
b.php appends it to an array and then
implodes it. And because I’m also interested in the performance of
implode I got a script
c.php that’s identical to
b.php but doesn’t implode afterwards. These are the results:
|b.php (array append and implode)||0.814s|
|c.php (array append)||0.732s|
Indeed, string concatenation with all its allocation and copying is actually faster than plain simple array appending. PHP is stupid.
11 thoughts on “Stupid PHP (1) (Strings are faster than Arrays)”
i’ve always wanted to see a language that uses something STL-rope-like to do strings & string operations, whereby a concat would just create a linked list, and character removal or changing would splice the component pieces being modified. the indirection would hurt and there’d be some signficant memory overhead, but i think it’d be interesting to see just how bad the final tradeoff is for easily modifiable strings.
found your blog searching for syslets information. good stuff, thanks for writing.
Would be nice. Although the random access would be hard.
Thanks for liking my humble blog.
PHP is stupid because string concatenation is fast? That’s an odd claim. I would expect a language who’s bread and butter is text manipulation to have efficient text manipulation functionality (if PHP actually is efficient is for another discussion).
It’s a standard CS trick to allocate more than you need. With strings, it is sometimes wise to allocate double the current size when you run out of space, if you assume you’re going to be a lot of appending.
If PHP uses that double-allocation method, I don’t know, but there’s no inherent reason why string manipulation should be slower than arrays. Especially for sizes of your “little_bit” that are on the order less than a few sizeof(long)s worth of data (which should be roughly what array manipulations are).
You know when you need a string for just the value or a string for expanding it. Most sane languages provide two different objects for these two very distinct needs. Respectively respectively str and StringIO, String and Stringbuilder in Python and .net and so on.
To put it explicit, it’s a stupid waste to add a buffer to each string and it is a stupid lack of feature to not have a way to efficiently build a string. I don’t know which is the case for PHP, but it’s one of them.
Each time you concatenate a string like
$a .= 'aaa'a string instance is made and the whole content is copied from the old
$ato the new string. Then
$ais set to the new string and if there are no more references to the old string it is freed. When you build a big string from n about equally sized parts (say average s) then you’ll allocate in total .5*s*n^2 and memory copy about that amount too.
In contrast. If you’ve got an array, which is probably implemented as an array of pointers, you can collect all strings in order with about s*n+n of memory (and this even includes the little bit itself!). When imploding the array you only need to allocate n*s of memory and copy that same amount.
Especially when taking in mind that memory usage is the big bottleneck of alogirthms these day, the array approach should be faster. Way faster.
I don’t know whether the PHP concat is fast compared to other languages. (Probably not though ;)). It’s way slower compared to its array appending and imploding which is the real bad.
It’s not strange to cry out loud “Stupid PHP” when a theoretical pretty inferior algorithm is faster. It’s against all my intuition.
There are more more stupid reasons why PHP not yet full enterprise development language. I have communicated with bug report system at http://www.php.net with the bug they have in exception handling in php.
Should catch a exception but it dies with fatal error that non_existant_function() does not exists 😀 . Its really stupid. Mr. Rasmus at php.net instantly replied my bug report saying it a “bogus” bug (again more stupid) and commented that Exceptions in PHP works only in OO. But for my laughter it even did not worked there too even if I wrote OO code to catch the exception, it still died with Fatal error.
Well.. it’s probably not a bug then. Even worse, it’s a feature omission.
That PHP is designed for internet scripting doesn’t excuse it for some of its stupidities. There is no reason not to include a StringBuilder. No reason to have a demented object model. Etcetera.
(But to be honoust. In most cases PHP does indeed get the job done quite easily and fast. It’s just that nagging. )
hi, thanks for the benchmark, though I think it’s not in the spirit of php to have separate string implementations. e.g. array() is used to represent what should be two totally separate types of datastructs … though I suspect lists are just internally represented as hashes, which might explain their relatively poor performance here.
Anyway, I think you’d have to extend your test before you can make inferences about how php is allocating memory. e.g. store a ref to the intermediate value from each concatenation to make sure php isn’t counting pointers and toggling between immutable and mutable representations appropriately, etc.
Anyway, blah blah, thanks
2013. Lol. Hey nice article- worked for me because I had a creepy feeling that I shouldn’t be mucking around with arrays in php and should choose another method- I just didn’t how bad it could be!
a couple other random comments targeted at some other responders:
@Axel: point is not “php is stupid because strings perform well.” Author never said the string performance was great. In fact it’s not- so the point seems to be how un#*%&ing-believable that Arrays perform worse than STRINGS.
@Php_tester: (looks like Bas answered this) and in 2013 there’s Node.js… and even if it weren’t for Node it isn’t the language itself which is slow- it’s the fact that generally speaking it’s a browser implementation vs. PHP which is a server implementation. Comparing apples to apples (nowadays) JS blows PHP away. If it weren’t for WordPress being written using PHP I’d never have gone very deep in it.