Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(june24) check if anything needs to be changed in dsample.f in the use of vecsize_used or nb_warp_used #983

Closed
valassi opened this issue Aug 30, 2024 · 4 comments
Assignees

Comments

@valassi
Copy link
Member

valassi commented Aug 30, 2024

Just a reminder for myself from the chat with @oliviermattelaer this afternoon

=> Check if vecsize_used and nb_warp_used are needed in dsample.f

@valassi
Copy link
Member Author

valassi commented Sep 2, 2024

Hi @oliviermattelaer I am looking at this.

These are currently the gg_tt.mad versions (after removing dead code and after indenting)

  • old means upstream/master
  • new means current june24

Old upstream/master
dsample.f.new.txt

c
c     Main Integration Loop
c
      ievent = 0
      iter = 1
      ivec = 0
      do while(iter .le. itmax)
c
c     Get integration point
c
        call sample_get_config(wgt,iter,ipole)
        if (iter .le. itmax) then
          ievent=ievent+1
          call x_to_f_arg(ndim,ipole,mincfig,maxcfig,ninvar,wgt,x,p)
          CUTSDONE=.FALSE.
          CUTSPASSED=.FALSE.
          if (passcuts(p,VECSIZE_USED)) then
            ivec=ivec+1
            all_p(:,ivec) = p(:)
            all_wgt(ivec) = wgt
            all_x(:,ivec) = x(:)
            all_xbk(:, ivec) = xbk(:)
            all_q2fact(:, ivec) = q2fact(:)
            all_cm_rap(ivec) = cm_rap
            all_lastbin(:, ivec) = lastbin(:)
            if (ivec.lt.VECSIZE_USED)then
              cycle
            endif
            ivec=0
            if (VECSIZE_USED.le.1) then
              all_fx(1) = dsig(all_p, all_wgt,0)
            else
              do i=1, VECSIZE_USED
c               need to restore common block                  
                xbk(:) = all_xbk(:, i)
                cm_rap = all_cm_rap(i)
                q2fact(:) = all_q2fact(:,i)
                CUTSDONE=.TRUE.
                CUTSPASSED=.TRUE.
                call prepare_grouping_choice(all_p(1,i), all_wgt(i), i.eq.1)
              enddo
              call select_grouping(imirror, iproc, iconf, all_wgt, VECSIZE_USED)
              call dsig_vec(all_p, all_wgt, all_xbk, all_q2fact, all_cm_rap,
     &          iconf, iproc, imirror, all_fx,VECSIZE_USED)
              do i=1, VECSIZE_USED
c               need to restore common block                  
                xbk(:) = all_xbk(:, i)
                cm_rap = all_cm_rap(i)
                q2fact(:) = all_q2fact(:,i)
              enddo
            endif
            do I=1, VECSIZE_USED
              all_wgt(i) = all_wgt(i)*all_fx(i)
            enddo
            do i =1, VECSIZE_USED
c             if last paremeter is true -> allow grid update so only for a full page
              lastbin(:) = all_lastbin(:,i)
              if (all_wgt(i) .ne. 0d0) kevent=kevent+1
              call sample_put_point(all_wgt(i),all_x(1,i),iter,ipole, i.eq.VECSIZE_USED) !Store result
            enddo
            if (VECSIZE_USED.ne.1.and.force_reset)then
              call reset_cumulative_variable()
              force_reset=.false.
            endif
          else
            fx =0d0
            wgt=0d0
            call sample_put_point(wgt,x(1),iter,ipole,.true.) !Store result
          endif
        endif
c
c     Write out progress/histograms
c
        if (kevent .ge. nwrite) then
          nwrite = nwrite+ncall*itmax/nsteps
          nwrite = min(nwrite,ncall*itmax)
          call graph_store
        endif
 99   enddo

New current june24
dsample.f.old.txt

c
c     Main Integration Loop
c
      ievent = 0
      iter = 1
      ivec = 0
      ilock = 0
      iwarp = 1
      do while(iter .le. itmax)
c
c     Get integration point
c
        call sample_get_config(wgt,iter,ipole)
        if (iter .le. itmax) then
          ievent=ievent+1
          call x_to_f_arg(ndim,ipole,mincfig,maxcfig,ninvar,wgt,x,p)
          CUTSDONE=.FALSE.
          CUTSPASSED=.FALSE.
          if (passcuts(p,VECSIZE_USED)) then
            ivec=ivec+1
            ilock = ilock+1
            if (ilock.gt.WARP_SIZE)then
              ilock = 1
              iwarp = iwarp +1
            endif
            all_p(:,ivec) = p(:)
            all_wgt(ivec) = wgt
            all_x(:,ivec) = x(:)
            all_xbk(:, ivec) = xbk(:)
            all_q2fact(:, ivec) = q2fact(:)
            all_cm_rap(ivec) = cm_rap
            all_lastbin(:, ivec) = lastbin(:)
            if (ilock.ne.WARP_SIZE)then
              cycle
            endif            
            if (VECSIZE_USED.le.1) then
              all_fx(1) = dsig(all_p, all_wgt,0)
            else
c             Here "i" is the position in the full grid of the event                  
              do i=(iwarp-1)*WARP_SIZE+1, iwarp*warp_size
c               need to restore common block                  
                xbk(:) = all_xbk(:, i)
                cm_rap = all_cm_rap(i)
                q2fact(:) = all_q2fact(:,i)
                CUTSDONE=.TRUE.
                CUTSPASSED=.TRUE.
                call prepare_grouping_choice(all_p(1,i), all_wgt(i),i.eq.(iwarp-1)*WARP_SIZE+1)
              enddo
              call select_grouping(imirror_vec(iwarp), iproc, iconf_vec(iwarp), all_wgt, iwarp)
              if (ivec.lt.VECSIZE_USED)then
                cycle
              endif
c             reset variable for the next grid               
              ivec = 0
              ilock = 0
              iwarp =1
              call dsig_vec(all_p, all_wgt, all_xbk, all_q2fact, all_cm_rap,
     &          iconf_vec, iproc, imirror_vec, all_fx,VECSIZE_USED)
              do i=1, VECSIZE_USED
c               need to restore common block                  
                xbk(:) = all_xbk(:, i)
                cm_rap = all_cm_rap(i)
                q2fact(:) = all_q2fact(:,i)
              enddo
            endif
            do I=1, VECSIZE_USED
              all_wgt(i) = all_wgt(i)*all_fx(i)
            enddo
            do i =1, VECSIZE_USED
c             if last paremeter is true -> allow grid update so only for a full page
              lastbin(:) = all_lastbin(:,i)
              if (all_wgt(i) .ne. 0d0) kevent=kevent+1
              call sample_put_point(all_wgt(i),all_x(1,i),iter,ipole, i.eq.VECSIZE_USED) !Store result
            enddo
            if (VECSIZE_USED.ne.1.and.force_reset)then
              call reset_cumulative_variable()
              force_reset=.false.
            endif
          else
            fx =0d0
            wgt=0d0
            call sample_put_point(wgt,x(1),iter,ipole,.true.) !Store result
          endif
        endif
c
c     Write out progress/histograms
c
        if (kevent .ge. nwrite) then
          nwrite = nwrite+ncall*itmax/nsteps
          nwrite = min(nwrite,ncall*itmax)
          call graph_store
        endif
 99   enddo

@valassi
Copy link
Member Author

valassi commented Sep 2, 2024

Hi @oliviermattelaer I had a look at the two pieces of code above.

My fortran is rusty (and it took me quite some time to understand the logic of the lagorithm), but all in all it looks correct, in both versions?

Rephrasing: is there something that you think should be changed?

To be specific:

  • NB_WARP is never used in the algorith, so there is no need for NB_WARP_USED
  • The logic which creates a new bunch of events every VECSIZE_USED in the old code remains the same in the new code, so I do not see a problem?

Even more specifically, the magic is here and is the same in old and new code

              if (ivec.lt.VECSIZE_USED)then
                cycle
              endif
c             reset variable for the next grid               
              ivec = 0

Until ivec<vecsize_used, only the part above is done (one event at a time). When ivec becomes vecsize_used, the part below (where there are vector APIs, with calls for bunches of vecsize_used events) is executed, but at the same time ivec=0 is reset, so that the next DO will initially fill a bunch.

Then in the vector part (the part 'below'), either vector APIs are used, or explicit loops over vecsize_used events. So all looks ok.

One final comment for my own sanity, passcuts takes a vecsize_used parameter but is actually mainly a scalar event by event call. The parameter is passed because on the "first" call there is some update_param for a full bunch of events. This part of the code looks really dangerous to me. There is a mixture of local FIRSTTIME and a common CUTSDONE. What should be a simple "momenta in for one event, falg true/false out for one event" becomes something whose state depends on a common which is initialized and reset not clear where? (That said, I even wonder if I introduced some of this myself 2-3 years ago? I ceratinly added the vecsize_used as a parameter, because previously it was a vecsize_memmax from a parameter... but definitely the mix of scalar and vector behaviour with commons was already there). Maybe this could be reviewed als to make sure there are no bugs? (see #987)

Voila. My summary: I am unable to see a problem here about a nb_warp_used missing somewhere. So I would merge june24. But please have a look yourself too Olivier.

Thansk!
Andrea

@valassi valassi changed the title (june24) check if vecsize_used and nb_warp_used are needed in dsample.f (june24) check if anything needs to be changed in dsample.f in the use of vecsize_used or nb_warp_used Sep 2, 2024
@oliviermattelaer
Copy link
Member

Ok convincing argument. Well done
Not sure what confuses me, but whatever, the only thing to do is to apologize for the false alarm and to thank you for this super clear argument and for the time that you took on this.

So let me close this issue and let merge the associated PR

@valassi
Copy link
Member Author

valassi commented Sep 3, 2024

No need to apologize Olivier, I very much prefer false alarms than missing real issues! Thanks so much to you, let's move on. I will start merging elsewhere...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants