First off, let me say that the reason I think this causes confusion for a lot of CFers is that they don't have a Computer Science background so they've not had the "Memory and Pointers 101" course that makes this stuff a lot clearer. Hopefully, this blog post will help fill in some of the gaps.
Some basics. When you assign something to a variable in CFML, you are really doing two things: you are creating a label (the variable name) and you are allocating some memory to associate the label with the data. In particular, with structs, the struct itself exists in a block of memory (well, lots of connected blocks of memory) and then the variable "points to" the struct data.
Let's start with the simplest example (these examples require ColdFusion 8.0.1):
var1 = { a = 1, b = { c = 2 }, d = 3 };
var2 = var1;
var2.a = 4; // affects both dump(var1=var1,var2=var2);
<cfdump label="arguments" var="#arguments#" />
</cffunction>
Now, let's look at a common idiom:
var1 = { a = 1, b = { c = 2 }, d = 3 };
var2 = structNew();
structAppend(var2,var1);
var2.a = 4; // does not affect var1 dump(var1=var1,var2=var2);
var2.b.c = 5; // affects both dump(var1=var1,var2=var2);
var2.b = var1.b;
var2.d = var1.d;
Now, what about structCopy()? It does exactly what the structNew() / structAppend() combination does. It creates a new top-level structure and populates it with the keys from the original struct. Any nested structs (or objects) will end up being shared between the original and the "copy". Here:
var1 = { a = 1, b = { c = 2 }, d = 3 };
var2 = structCopy(var1);
var2.a = 4; // does not affect var1 dump(var1=var1,var2=var2);
var2.b.c = 5; // affects both dump(var1=var1,var2=var2);
If you want a complete, separate copy, you need to use duplicate() which will walk the entire data structure and create a brand new copy of every level within it:
var1 = { a = 1, b = { c = 2 }, d = 3 };
var2 = duplicate(var1);
var2.a = 4; // does not affect var1 dump(var1=var1,var2=var2);
var2.b.c = 5; // does not affect var1 dump(var1=var1,var2=var2);
As the documentation for structCopy() says "Copies top-level keys, values, and arrays in the structure by value; copies nested structures by reference." and it goes on to say "To copy a structure entirely by value, use Duplicate." So why would you ever use structCopy()? You probably don't need it very often but bear in mind how it works compared to structAppend() and how often you use that function. If you have a struct containing CFCs, you may well not want to duplicate() the CFCs (remember: duplicate() does a full deep copy of CFCs now which won't be correct if your CFC refers to a singleton, e.g., TransferObject CFCs if you're using Transfer). If your struct is just a container for data and you don't need the data itself to be copied (i.e., a new copy made), then structCopy() is what you want. If your struct can contain CFCs, think very carefully about the impact of using duplicate() - again, structCopy() may be what you want.
Understanding copy-by-value vs copy-by-reference is very important when dealing with complex data structures.
this is very cool. thanks for taking the time to write it out.
i *thought* i had a handle on the structure stuff (see blog post from a year ago @ http://charlie.griefer.com/blog/index.cfm/2007/2/7/structures-in-coldfusion), but i see that i really didn't grasp the difference between structCopy() and just doing struct1 = struct2;. apparently i didn't really have a firm grasp on how structAppend() works either.
the docs also led me to believe that structCopy() was really more or less obsolete ("the duplicate() function replaces structCopy() for most, if not all, purposes."). i figured for deep copies, you had duplicate(). for shallow copies, you could just do struct1 = struct2. i thought that structCopy() existed in the same realm as parameterExists() (to maintain backward compatibility).
going to have to experiment with some of the code you put out here. thanks again :)
I like the parallel to StructNew() / StructAppend(). That really goes a long way for an understanding of how StructCopy() works. Most excellent.
Are any other ColdFusion objects besides structs assigned by reference? If I say (cfset MyNewArray = MyOldArray), do I have 2 Arrays or 2 references to the same array? What about (cfset MyNewQuery = MyOldQuery)?
I'm a big fan of writing "sleuth" routines (experiments to see what happens), such as you gave in this article. But is there a table somewhere that lists what's assigned by reference and what instantiates a new object? No sense in going through all that if an authoritative list exists somewhere. Do you know of one?
Not sure about queries but I think they're copied by reference...
Queries are in fact by reference, as are instances of Java objects and XML documents. Not sure about .NET proxy objects, but I'd be surprised if they weren't by-reference.
I find it safer to think in terms of "simple values (incl. Dates) and arrays are by-value, everything else is by-reference" than to try to remember all the different types in modern CF :)
-Joe
Thinking of arrays as by-value is a good starting point but there are some interesting gotchas there. I'll blog about that separately.
The events that brought this on were server hangs. We had some serious locking problems that would accumulate and lock up one of our CF Servers. At that time I discovered that cfincludes were single-thread, because many processes were hanging on runPage locks. Apparently, runPage is a synchronized method. (Or maybe it's the cfclass file that's being used for locking, to lock just that cfincluded file.) Also, one of the hangs was traced to a lock that occurred in a Duplicate of a struct.
Like your CFConcurrency, I'm taking a long hard look at Java's new concurrent package, Session scope variables, object sharing, locking, memory usage, etc. It sounds as if there isn't any table of reference/shallowcopy/deepcopy. I think I may need to accumulate one via sleuth routines.
I've discovered some scary things, and may end up giving a cautionary presentation at one of the big conferences one of these days.
Steve


