MATLAB’s inefficient copy-on-write implementation
MATLAB’s copy-on-write memory management seems to have a serious defect, which I think is the reason behind the abysmal performance of subsasgn overloading. (The same problem probably occurs with parenAssign in the new R2021b RedefinesParen class — I haven’t yet experimented with it.) Normally, an array assignment like b = a simply does a pointer copy; the array data is not copied until b is modified (e.g. b(1) = 1). Thereafter, subsequent modification of b (e.g. b(2) = 1) do not copy the full array; they just modify it in place as long as the reference count is 1. For example,
clear, a = zeros(1e8,1);
memory % 2764 MB used by MATLAB
b = a;
memory % 2764 MB
tic, b(1) = 1; toc, memory % 0.329099 seconds, 3540 MB
tic, b(2) = 1; toc, memory % 0.000123 seconds, 3541 MB
However, the benefit of copy-on-write is lost when the variable is changed in a function, e.g.
% test.m
function x = test(x)
x(1) = 1;
In this case, the x reference count is apparently incremented in test before the assignment is made, so this will always result in a full array copy. For example,
clear, a = zeros(1e8,1);
tic, a = test(a); toc % 0.337475 seconds
tic, a = test(a); toc % 0.310373 seconds
To see what’s happening with copy-on-write, test.m is modified as follows:
function x = test(x)
memory
x(1) = 1;
memory
return
The array modification inside the function forces a full array copy, even though the original array is immediately discarded:
clear, a = zeros(1e8,1);
memory % 2748 MB
a = test(a); % 2748 MB, 3503 MB
memory % 2740 MB
I would think this problem could be easily avoided by treating any variable that appears as both an input and output argument in a function (e.g. function x = test(x)) as a reference variable, i.e. its reference count is not incremented on entering the function and is not decremented upon exiting. If the function is called with different input and output arguments, e.g. y = test(x), then the interpreter would implement this as y = x; y = test(y).
Is there any particular reason why MATLAB does not or cannot do this? There are many applications such as subasgn overloading that could see a big performance boost if this problem is fixed.MATLAB’s copy-on-write memory management seems to have a serious defect, which I think is the reason behind the abysmal performance of subsasgn overloading. (The same problem probably occurs with parenAssign in the new R2021b RedefinesParen class — I haven’t yet experimented with it.) Normally, an array assignment like b = a simply does a pointer copy; the array data is not copied until b is modified (e.g. b(1) = 1). Thereafter, subsequent modification of b (e.g. b(2) = 1) do not copy the full array; they just modify it in place as long as the reference count is 1. For example,
clear, a = zeros(1e8,1);
memory % 2764 MB used by MATLAB
b = a;
memory % 2764 MB
tic, b(1) = 1; toc, memory % 0.329099 seconds, 3540 MB
tic, b(2) = 1; toc, memory % 0.000123 seconds, 3541 MB
However, the benefit of copy-on-write is lost when the variable is changed in a function, e.g.
% test.m
function x = test(x)
x(1) = 1;
In this case, the x reference count is apparently incremented in test before the assignment is made, so this will always result in a full array copy. For example,
clear, a = zeros(1e8,1);
tic, a = test(a); toc % 0.337475 seconds
tic, a = test(a); toc % 0.310373 seconds
To see what’s happening with copy-on-write, test.m is modified as follows:
function x = test(x)
memory
x(1) = 1;
memory
return
The array modification inside the function forces a full array copy, even though the original array is immediately discarded:
clear, a = zeros(1e8,1);
memory % 2748 MB
a = test(a); % 2748 MB, 3503 MB
memory % 2740 MB
I would think this problem could be easily avoided by treating any variable that appears as both an input and output argument in a function (e.g. function x = test(x)) as a reference variable, i.e. its reference count is not incremented on entering the function and is not decremented upon exiting. If the function is called with different input and output arguments, e.g. y = test(x), then the interpreter would implement this as y = x; y = test(y).
Is there any particular reason why MATLAB does not or cannot do this? There are many applications such as subasgn overloading that could see a big performance boost if this problem is fixed. MATLAB’s copy-on-write memory management seems to have a serious defect, which I think is the reason behind the abysmal performance of subsasgn overloading. (The same problem probably occurs with parenAssign in the new R2021b RedefinesParen class — I haven’t yet experimented with it.) Normally, an array assignment like b = a simply does a pointer copy; the array data is not copied until b is modified (e.g. b(1) = 1). Thereafter, subsequent modification of b (e.g. b(2) = 1) do not copy the full array; they just modify it in place as long as the reference count is 1. For example,
clear, a = zeros(1e8,1);
memory % 2764 MB used by MATLAB
b = a;
memory % 2764 MB
tic, b(1) = 1; toc, memory % 0.329099 seconds, 3540 MB
tic, b(2) = 1; toc, memory % 0.000123 seconds, 3541 MB
However, the benefit of copy-on-write is lost when the variable is changed in a function, e.g.
% test.m
function x = test(x)
x(1) = 1;
In this case, the x reference count is apparently incremented in test before the assignment is made, so this will always result in a full array copy. For example,
clear, a = zeros(1e8,1);
tic, a = test(a); toc % 0.337475 seconds
tic, a = test(a); toc % 0.310373 seconds
To see what’s happening with copy-on-write, test.m is modified as follows:
function x = test(x)
memory
x(1) = 1;
memory
return
The array modification inside the function forces a full array copy, even though the original array is immediately discarded:
clear, a = zeros(1e8,1);
memory % 2748 MB
a = test(a); % 2748 MB, 3503 MB
memory % 2740 MB
I would think this problem could be easily avoided by treating any variable that appears as both an input and output argument in a function (e.g. function x = test(x)) as a reference variable, i.e. its reference count is not incremented on entering the function and is not decremented upon exiting. If the function is called with different input and output arguments, e.g. y = test(x), then the interpreter would implement this as y = x; y = test(y).
Is there any particular reason why MATLAB does not or cannot do this? There are many applications such as subasgn overloading that could see a big performance boost if this problem is fixed. memory management, copy-on-write, subsasgn, parenassign MATLAB Answers — New Questions