Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: special object for decision rules. #126

Closed
wants to merge 2 commits into from
Closed

Conversation

albop
Copy link
Member

@albop albop commented Dec 20, 2017

@sglyon : here is an attempt to resolve issue #125 .
One can do:

drs = DRStore([rand(32) for i=1:3])
drs.data :: Tuple{Vector{Float64}}
drs.flat :: Vector{Float64}
maximum(abs, cat(1, drs.data...)- drs.flat) # -> 0
drs.flat[10] = 1.2
maximum(abs, cat(1, drs.data...)- drs.flat) # -> 0

The fields drs.data can contain the decision rule as we store it right now (almost, it is a tuple instead of a vector so that it is immutable) and drs.flat represents it in all states as one single vector. Trick is both fields refer to the same part of memory and are always synchronized. I believe there is no memory cleaning problem but that would require careful thinking.

I have hesitated as to how drs itself should behave. In the end, I implemented the behaviour from before (so drs[1] returns a vector) but didn't use it. The reason is to backward compatibility, but I ended up not using drs directly, using instead drs.data most of the time.
The alternative would drs to behave as drs.flat, at which case it could derive from AbstractVector. I can see how we would get some operations for free, but haven't found one very good reason. An idea would be to do drs[i] for flat indexing and drs[G(i)] or smthg like that for group indexing.

So far I have only applied it to time_iteration_direct (memory 350->20) and would like some early feedback before generalizing it. Any thought ?

Copy link
Member

@sglyon sglyon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exciting!

I have a few concerns and questions that I think we should address before merging, but it is clear that moving in a direction like this can definitely lead to large memory efficiency gains.

function DRStore(::Type{T}, dims::AbstractVector{Int}) where T
L = sum(dims)
fD = zeros(T, L)
ptr = pointer(fD)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes 😨 ! I don't think I'v ever seen pointer and unsafe_wrap used out in in the pure Julia wild (e.g. not interfacing with C libs)

Are we confident that this is safe and works correctly? I would guess that it does given your experiments, but it makes me a little scared.

What type of penalty would we get if we were to construct D using SubArrays (e.g. using view) instead of unsafe_wrap and pointer offsets?

Copy link
Member Author

@albop albop Dec 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think problem is not whether it is safe:

  • we control object construction, and the object is immutable so user cannot alter its behaviour. Of course this assumes we add consistency checks.
  • all arrays are linked to the same object, so one "view" cannot be destroyed without the other being destroyed. By construction, the memory we access should be freed for all arrays/subarrays at the same time.
    One potential problem is actually performance. From my reptilian understanding of the discussion here (Get rid of reinterpret in its current form JuliaLang/julia#22849), the compiler can optimize further if we guarantee non-aliasing. Se we might slow down things a little bit with this approach (but not more than our current use of reinterpret).

In principle, SubArray should work, and we should be able to pass them to all routines (including interpolation routines). Right now it doesn't:

  • our interpolation routine assume that input are contiguous arrays (i.e. take Array as arguments). We rely on this to reinterpret arrays in both directions.
  • we cannot just change all arguments to AbstractArray. That would be inapropriate, because it doesn't give us any guarantee that we can reinterpret them.
  • SubArray have a flag which tells whether they are contiguous, but reinterpret doesn't work on them.
    Intuitively, the condition for us to be able to reinterpret an array, should not for it to be contiguous but for all consecutive chunks to be a multiple of the size of all datatype we wish to reinterpret it into. I think this is the direction, which is followed in Julia 0.7, where reinterpret will return a special type of Array. It would be natural that some SubArrays are reinterpretable too, but I don't know whether this is the case yet.

Right now, I don't totally understand how the new reintepret will work in 0.7 and it certainly won't be backward compatible.
On the other hand, the current behaviour is arguably stupid, but certainly forward compatible, unless they remove pointer, which I'm sure they won't. Maybe there is a way to actually tell the compiler we are not aliasing anything. Overall there are less factors for us to take into account with that simple solution so I'm quite inclined to stay with it for now.

ptr = pointer(fD)
sz = sizeof(T)
offsets = [[0]; cumsum(dims)[1:end-1]]*sz
D = [unsafe_wrap(Array{T}, ptr+offsets[i], dims[i]) for i=1:length(dims)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should probably be testing that fD is the correct length, right?

unsafe_wrap will let us have elements of D defined over memory that extends beyond fD:

julia> fD = rand(4)
4-element Array{Float64,1}:
 0.36243
 0.0790668
 0.232033
 0.975159


julia> X = unsafe_wrap(Array{Float64}, pointer(fD), 5)
5-element Array{Float64,1}:
 0.36243
 0.0790668
 0.232033
 0.975159
 9.88131e-324

julia> fD[:] = 1
1

julia> X
5-element Array{Float64,1}:
 1.0
 1.0
 1.0
 1.0
 9.88131e-324

(they did warn us it was unsafe)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anything wrong with the value 9.88131e-324. I quite like it actually.
(joke aside: sure we need to check dimensions).


DRStore! = DRStore

function Main.copy(ds::DRStore{T}) where T
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be Base.copy

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I followed the error message without thinking...

return ds1
end

Main.copy!(ds0::DRStore{T}, ds1::DRStore{T}) where T = copy!(ds0.flat, ds1.flat)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Base.copy!

Main.copy!(ds0::DRStore{T}, ds1::DRStore{T}) where T = copy!(ds0.flat, ds1.flat)


function Main.clamp!(ds::DRStore{T}, ds_lb::DRStore{T}, ds_ub::DRStore{T}) where T
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Base.clamp!


while it<maxit && err>tol_η

it += 1

tic()

E_f = [zeros(Point{n_h},N) for i=1:number_of_smooth_drs(dprocess)]
set_values!(dr, [ds0.data...])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any kind of penalty from the [ds0.data...] that appears throughout? (e.g. performance, memory, or type inference penalty)

This might be pushing things too far (and please tell me if it is), but would it make sense to have drs.data be NTuple{N,Vector{T}} so we know how many elements are in it and can use @generated functions to unroll these [drs.data...] calls?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I don't think we have any penalty. The elements of [ds0.data...] should be correctly inferred. set_values! itself could be improved to take a tuple as argument. I doubt unrolling would provide much benefit, as the amount of code within the loop is quite big.

@albop
Copy link
Member Author

albop commented Dec 21, 2017

Thanks for the comments. So, this seems like the right direction. I'll make two changes:

  • rename drs.data to drs.grouped so that the object is more neutral as to which is its natural layout.
  • remove the indexing functions
    Then I'll wait to see whether I'm hundred convinced to derive from AbstractArray{T}.
    Any idea for a good name of a function to access drs.grouped[i] ? συβ(drs,i) ?

@sglyon
Copy link
Member

sglyon commented Dec 21, 2017

What if we did drs[i, :] for drs.grouped[i] (or the colon in the first position -- whichever feels more natural to you)

@albop
Copy link
Member Author

albop commented Dec 21, 2017 via email

@albop
Copy link
Member Author

albop commented Dec 21, 2017 via email

@sglyon
Copy link
Member

sglyon commented Dec 21, 2017

Ok sounds great!

@albop
Copy link
Member Author

albop commented Dec 21, 2017

Quick question : what is a nice simple way to do a[:] += b[:] without allocating ?

@sglyon
Copy link
Member

sglyon commented Dec 21, 2017

I believe that a .+= b would work

You can always use the trusty AXPY Blas routine:

julia> a = [1, 2, 3.0]
b =3-element Array{Float64,1}:
  1.0
 2.0
 3.0

julia> b = [10, 10, 10.0]
3-element Array{Float64,1}:
 10.0
 10.0
 10.0

julia> Base.axpy!(1, b, a)
3-element Array{Float64,1}:
 11.0
 12.0
 13.0

julia> a
3-element Array{Float64,1}:
 11.0
 12.0
 13.0

It looks like the broadcasting version is faster:

julia> function f1(a, b)
       a .+= b
       end
f1 (generic function with 1 method)

julia> function f2(a, b)
       Base.axpy!(1, b, a)
       end
f2 (generic function with 1 method)

julia> a = rand(1000);

julia> b = rand(1000);

julia> f1(a, b); @allocated f1(a, b)
96

julia> f2(a, b); @allocated f2(a, b)
0

julia> using BenchmarkTools

julia> @benchmark f1(a, b)
BenchmarkTools.Trial:
  memory estimate:  96 bytes
  allocs estimate:  4
  --------------
  minimum time:     341.165 ns (0.00% GC)
  median time:      353.771 ns (0.00% GC)
  mean time:        394.723 ns (2.26% GC)
  maximum time:     13.852 μs (94.29% GC)
  --------------
  samples:          10000
  evals/sample:     218

julia> @benchmark f2(a, b)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.381 μs (0.00% GC)
  median time:      1.403 μs (0.00% GC)
  mean time:        1.541 μs (0.00% GC)
  maximum time:     74.430 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     10

@albop
Copy link
Member Author

albop commented Dec 21, 2017 via email

@sglyon
Copy link
Member

sglyon commented Dec 21, 2017

Yeah I’m sure about it.

a .+= b is syntactic sugar for broadcast!(+, a, a, b) , which says (in ordrer arguments appear in the function call) broadcast the + function store the result in a and apply + to arguments a and b

julia> expand(:(a .+= b))
:((Base.broadcast!)(+, a, a, b))

@albop
Copy link
Member Author

albop commented May 10, 2020

Getting back to it, not understanding much. Looks like my former self was a better version of me.

@albop
Copy link
Member Author

albop commented Jan 4, 2022

This is now redundant, as we are using (hopefully fast) views.

@albop albop closed this Jan 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants