Last year, my brother discovered a seemingly unexplainable quirk/bug with PHP. He was manipulating an array with references to some items, then making a copy of the array and manipulating that afterwards. Bafflingly, this was somehow modifying the original array too. He certainly knows what he's doing, and there weren't any rookie mistakes like shallow copying of objects going on. He was just working with primitive types like ints. A simple example of what was happening can be run online here, or you can see it below:
$arr = array("a");
$a = &$arr[0];
$arr2 = $arr;
$arr2[0] = "b";
print_r($arr);
This simple code prints:
Array
(
[0] => b
)
If you don't have the reference in there, it remains "a". Somehow the mere act of assigning a reference to an int in an array changes the behavior of any future copies thereof. How can that possibly make any sense?
It took a while to figure out, but vardump()
is a very handy function for these situations. If we dump the array before and after assigning the reference, we can see what's going on:
array(1) {
["c"]=>
string(1) "a"
}
array(1) {
["c"]=>
&string(1) "a"
}
Since PHP arrays can be either lists or dictionaries, its engine does not have a way to just reference the address of an item therein. Instead when you create a reference to a primitive in an array, PHP converts the primitive to a reference to that primitive. Then both the array and the other variable can have a pointer to this stored value. Usually, this is fine enough. You can use a reference to an int seamlessly as if were the int itself usually. However, the fact that you cannot even see that anything changed is what makes this bug so insidious. If you thereafter copy the reference, you will think it is still a raw int. Since array copies are shallow, any modification to the array copy will modify the pointed-to value, modifying it everywhere.
This is a reported, if little-known, bug. The proposed fix was to make arrays deep copy by default, but this was deemed to costly. I would advocate that a far better fix would be to supplement arrays in some way to allow references to point to items thereof without modifying the array, but I am blissfully unaware of how challenging this would be to implement.
If you do want to work around this, there is no clear, right way to do so. Not only does PHP shallow clone by default, it does not have a built-in way to deep clone an array, nor to dereference a reference.
Obviously, the first—and probably best—option is to avoid ever making a reference to an array item. This behavior means it will almost inevitably come back to bite you, and the hassle of just storing a key and/or reference to the array is surely less than the hassle of trying to figure out how your data is being secretly corrupted. As it stands, the ability to store references to primitive array values is essentially an anti-feature.
If you really want to deal with the worst parts of PHP, the second option would be to use references freely, and then manually copy arrays a level deeper.
Instead of doing $array_b = $array_a
, do:
$array_b = [];
for ($i = 0; $i < count($array_a); $i++) {
$array_b[$i] = $array_a[$i];
}
This will assign the references by value instead of by references, at a slight performance loss.
The third option would be to convert one of the arrays to all literals:
$a = unserialize(serialize($a));
This is probably even slower and might not be an option if you have some complex objects mixed in with primitive types, but that is its own footgun.
The fourth option, and what I would endorse if you must use such references is to manually delete them when you are done with them. Due to the really funky way PHP handles references, if you destroy either copy, the other reference will even more surprisingly revert back to a primitive:
$arr = array("a");
$a = &$arr[0];
// Do stuff you needed the reference for
unset($a);
$arr2 = $arr;
$arr2[0] = "b";
This code has no problem. As long as you don't need $a
at the same time as $arr2
, this is the least-messy solution to enable use of a terrible facet of PHP.