Viewing By Entry / Main
July 6, 2008
Over on Will Tomlinson's blog there's a piece about using structCopy() to create a copy of a struct and a note from Charlie Griefer cautioning that for Will's example, he probably needed to use duplicate() instead. After discussing this will Will on IM, I figured it might be instructive to look at how structCopy() differs from duplicate() and why you might use it instead.

First off, let me say that the reason I think this causes confusion for a lot of CFers is that they don't have a Computer Science background so they've not had the "Memory and Pointers 101" course that makes this stuff a lot clearer. Hopefully, this blog post will help fill in some of the gaps.

Some basics. When you assign something to a variable in CFML, you are really doing two things: you are creating a label (the variable name) and you are allocating some memory to associate the label with the data. In particular, with structs, the struct itself exists in a block of memory (well, lots of connected blocks of memory) and then the variable "points to" the struct data.

Let's start with the simplest example (these examples require ColdFusion 8.0.1):

writeOutput("Assign (by reference):<br />");
   var1 = { a = 1, b = { c = 2 }, d = 3 };
   var2 = var1;
   var2.a = 4; // affects both    dump(var1=var1,var2=var2);
Assume a dump function like this:
<cffunction name="dump">
   <cfdump label="arguments" var="#arguments#" />
</cffunction>
OK, so in the above code, var1 points to a struct that contains two top-level keys (a, d) and a nested struct (b, which points to a struct containing c). When the assignment (var2 = var1;) is executed, var2 is made to point to the same thing that var1 points at. In other words, var1 and var2 are synonyms. When you change the struct data through var2, it modifies the single, shared copy of the struct.

Now, let's look at a common idiom:

writeOutput("StructNew/StructAppend (equivalent to structCopy):<br />");
   var1 = { a = 1, b = { c = 2 }, d = 3 };
   var2 = structNew();
   structAppend(var2,var1);
   var2.a = 4; // does not affect var1    dump(var1=var1,var2=var2);
   var2.b.c = 5; // affects both    dump(var1=var1,var2=var2);
In this code, var1 points to the struct data and var2 is set to point to a new, empty struct. The structAppend() call copies the (top-level) elements from the first struct to the second struct. At that point, var1 points to a struct that contains two top-level keys (a, d) and a nested struct (b, which points to a struct containing c); var2 points to a (separate) struct that contains two (new) top-level keys (a, d) and a nested struct (b, which points to the same struct data as the first b under var1). Let's look at that again: structAppend() copies in a, b and d as keys to the new struct as if it had done:
var2.a = var1.a;
   var2.b = var1.b;
   var2.d = var1.d;
We can see that var2.b is made to point to the same struct data as var1.b just as the direct assignment of var1 to var2 did in the first example above. When we assign var2.a = 4; we are updating the value associated with the key a in the struct pointed to by var2. Since a was copied into var2, it's a different key entry to the a in var1. When we assign var2.b.c = 5; we are reaching into the shared struct data that both var1.b and var2.b point to and updating it. That's why that change appears in both dumps - because there's only one instance of that struct, pointed to by both of the top-level structs.

Now, what about structCopy()? It does exactly what the structNew() / structAppend() combination does. It creates a new top-level structure and populates it with the keys from the original struct. Any nested structs (or objects) will end up being shared between the original and the "copy". Here:

writeOutput("StructCopy:<br />");
   var1 = { a = 1, b = { c = 2 }, d = 3 };
   var2 = structCopy(var1);
   var2.a = 4; // does not affect var1    dump(var1=var1,var2=var2);
   var2.b.c = 5; // affects both    dump(var1=var1,var2=var2);

If you want a complete, separate copy, you need to use duplicate() which will walk the entire data structure and create a brand new copy of every level within it:

writeOutput("Duplicate:<br />");
   var1 = { a = 1, b = { c = 2 }, d = 3 };
   var2 = duplicate(var1);
   var2.a = 4; // does not affect var1    dump(var1=var1,var2=var2);
   var2.b.c = 5; // does not affect var1    dump(var1=var1,var2=var2);
The duplicate() call not only copies the top-level struct (as shown above) but also the nested struct, so that var2.b points to a new struct that is a copy of var1.b. Thus var2.b.c is completely separate from var1.b.c.

As the documentation for structCopy() says "Copies top-level keys, values, and arrays in the structure by value; copies nested structures by reference." and it goes on to say "To copy a structure entirely by value, use Duplicate." So why would you ever use structCopy()? You probably don't need it very often but bear in mind how it works compared to structAppend() and how often you use that function. If you have a struct containing CFCs, you may well not want to duplicate() the CFCs (remember: duplicate() does a full deep copy of CFCs now which won't be correct if your CFC refers to a singleton, e.g., TransferObject CFCs if you're using Transfer). If your struct is just a container for data and you don't need the data itself to be copied (i.e., a new copy made), then structCopy() is what you want. If your struct can contain CFCs, think very carefully about the impact of using duplicate() - again, structCopy() may be what you want.

Understanding copy-by-value vs copy-by-reference is very important when dealing with complex data structures.

Comments

hey sean:

this is very cool. thanks for taking the time to write it out.

i *thought* i had a handle on the structure stuff (see blog post from a year ago @ http://charlie.griefer.com/blog/index.cfm/2007/2/7/structures-in-coldfusion), but i see that i really didn't grasp the difference between structCopy() and just doing struct1 = struct2;. apparently i didn't really have a firm grasp on how structAppend() works either.

the docs also led me to believe that structCopy() was really more or less obsolete ("the duplicate() function replaces structCopy() for most, if not all, purposes."). i figured for deep copies, you had duplicate(). for shallow copies, you could just do struct1 = struct2. i thought that structCopy() existed in the same realm as parameterExists() (to maintain backward compatibility).

going to have to experiment with some of the code you put out here. thanks again :)


@Sean,

I like the parallel to StructNew() / StructAppend(). That really goes a long way for an understanding of how StructCopy() works. Most excellent.


Thanks for taking the time to blog this Sean. I'm sure it'll help more than just Charlie and myself. :)


Sean,

Are any other ColdFusion objects besides structs assigned by reference? If I say (cfset MyNewArray = MyOldArray), do I have 2 Arrays or 2 references to the same array? What about (cfset MyNewQuery = MyOldQuery)?

I'm a big fan of writing "sleuth" routines (experiments to see what happens), such as you gave in this article. But is there a table somewhere that lists what's assigned by reference and what instantiates a new object? No sense in going through all that if an authoritative list exists somewhere. Do you know of one?


@Steve, CFCs are assigned by reference, arrays are assigned by shallow copy (a new array is constructed and then each element is assigned so when an array of CFCs is assigned, you get a new array containing the same CFC instances, just like structCopy() does for structs).

Not sure about queries but I think they're copied by reference...


@Steve, Sean -

Queries are in fact by reference, as are instances of Java objects and XML documents. Not sure about .NET proxy objects, but I'd be surprised if they weren't by-reference.

I find it safer to think in terms of "simple values (incl. Dates) and arrays are by-value, everything else is by-reference" than to try to remember all the different types in modern CF :)

-Joe


@Joe, good point (re: all the different types).

Thinking of arrays as by-value is a good starting point but there are some interesting gotchas there. I'll blog about that separately.


Sean and Joe,

The events that brought this on were server hangs. We had some serious locking problems that would accumulate and lock up one of our CF Servers. At that time I discovered that cfincludes were single-thread, because many processes were hanging on runPage locks. Apparently, runPage is a synchronized method. (Or maybe it's the cfclass file that's being used for locking, to lock just that cfincluded file.) Also, one of the hangs was traced to a lock that occurred in a Duplicate of a struct.

Like your CFConcurrency, I'm taking a long hard look at Java's new concurrent package, Session scope variables, object sharing, locking, memory usage, etc. It sounds as if there isn't any table of reference/shallowcopy/deepcopy. I think I may need to accumulate one via sleuth routines.

I've discovered some scary things, and may end up giving a cautionary presentation at one of the big conferences one of these days.

Steve


Post Your Comments
Name:
Email Address:
Comments
*** Please note that all comments require moderation so it may be some time before your comment posts to this blog! ***
Remember My Information:
 



Hosting provided by