Whither test_network_sync? #57

rouson · 2022-04-20T02:14:53Z

After attempting to develop test_network_sync.f90 into a test that reports passing or failure and seeking some advice Tom Clune, who develops the pFUnit testing framework, I'm not confident there's a way to write a test that is both meaningful straightforward. Without doing extensive research myself, I have the sense is that the formula employed in the randn functions has been thoroughly studied and there's not much to test in its behavior beyond what is already known. @milancurcic if you have thoughts on a useful test, please let me know. Otherwise, I recommend removing the test, given that it currently just prints the generated samples without any form of check on the their statistical properties.

The text was updated successfully, but these errors were encountered:

rouson · 2022-04-20T02:31:43Z

Also, the sync type-bound procedure suffers from an unfortunate ambiguity in its name. I understand the reason for the name, but I'd just call it broadcast or something similar because there's no explicit synchronization (sync all or sync images) and the standard imposes no precise synchronization semantics on collective subroutines other than noting that "the calculations performed by a collective subroutine have some internal synchronizations."

milancurcic · 2022-04-20T03:03:16Z

Yes, I wouldn't worry about this one as it will likely go away. I suggest focusing on io, datasets, and activations.

milancurcic · 2022-04-20T03:34:56Z

I looked at what sync does to remind myself. It updates the weights and biases on all images > 1 to match those on image 1. A passing test for this would ensure they are indeed exactly the same after the call to sync.

rouson · 2022-04-20T17:17:28Z

@milancurcic perfect. Thanks. I'll submit a PR with the suggested test.

rouson · 2022-04-20T21:46:17Z

@milancurcic It still might be that test_network_sync is unnecessary. Fortran 2018 provides an intrinsic subroutine: random_init(repeatable, image_distinct), where both arguments are logical, intent(in) and where passing image_distinct=.true. ensures that each image uses the same sequence of random numbers. At scale, some amount of redundant computation generally outperforms communication so the code will likely scale better by replacing co_broadcast with a call to random_init with image_distinct=.false..

The GNU, Intel, and Cray compilers support random_init. Some important updates to the behavior of random_init will appear in GCC 12, however, so I'll need to check whether GCC 11 works for our purposes. I just submitted a feature request to NAG, which usually response quickly so there's a reasonable chance that the NAG compiler will support random_int soon too.

rouson · 2022-04-20T23:06:24Z

Actually yet another way to view this is that the test is useful either way and in fact, the test should give the same result either way. How that result is produced is an implementation detail that can be considered separately.

milancurcic · 2022-04-20T23:34:53Z

Yes, in other words, ensuring that the initial state of the network is valid across images.

rouson · 2022-04-21T19:26:12Z

I have determined that with gfortran 11.2.0 and the head of the main branch of OpenCoarrays, calling random_init with image_distinct=.false. produces the same random numbers on each image without need for co_broadcast. See below. I haven't tested earlier versions of gfortran or OpenCoarrays.

I recommend this approach for scalability reasons. I'll submit a PR with separate commits that add the test and replace the co_broadcast with random_init. I'm thinking of setting repeatable=.true. so that test results is deterministic. The current test is non-deterministic so if you prefer that approach, let me know. I suppose we could also think about having a deterministic test and a non-deterministic test. I'm not sure it matters much either way given that our only goal is to ensure the same weights on all images.

% cat random.f90 
program main
  implicit none
  real harvest(5)
  call random_init(.true.,.false.)
  call random_number(harvest)
  print *, this_image(), ":", harvest
end program

% caf random.f90 

% cafrun -n 8 ./a.out
           5 :  0.571258783      0.226831257      0.175722122      0.691333890      0.885308683    
           6 :  0.571258783      0.226831257      0.175722122      0.691333890      0.885308683    
           7 :  0.571258783      0.226831257      0.175722122      0.691333890      0.885308683    
           8 :  0.571258783      0.226831257      0.175722122      0.691333890      0.885308683    
           1 :  0.571258783      0.226831257      0.175722122      0.691333890      0.885308683    
           2 :  0.571258783      0.226831257      0.175722122      0.691333890      0.885308683    
           3 :  0.571258783      0.226831257      0.175722122      0.691333890      0.885308683

milancurcic · 2022-04-25T17:57:36Z

Thanks @rouson. Any chance for this to work with GFortran 10?

rouson · 2022-04-25T19:32:37Z

Most likely, yes. Certainly the current approach should work with GFortran 10. If random_int works with 10, I'll submit that approach as a separate PR.

milancurcic · 2023-07-14T17:03:39Z

Closing in favor of #78.

milancurcic mentioned this issue May 24, 2022

Initialize random number generation for consistent behavior between compilers #78

Open

milancurcic closed this as completed Jul 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whither test_network_sync? #57

Whither test_network_sync? #57

rouson commented Apr 20, 2022

rouson commented Apr 20, 2022

milancurcic commented Apr 20, 2022

milancurcic commented Apr 20, 2022

rouson commented Apr 20, 2022

rouson commented Apr 20, 2022

rouson commented Apr 20, 2022

milancurcic commented Apr 20, 2022

rouson commented Apr 21, 2022 •

edited

Loading

milancurcic commented Apr 25, 2022

rouson commented Apr 25, 2022

milancurcic commented Jul 14, 2023

Whither test_network_sync? #57

Whither test_network_sync? #57

Comments

rouson commented Apr 20, 2022

rouson commented Apr 20, 2022

milancurcic commented Apr 20, 2022

milancurcic commented Apr 20, 2022

rouson commented Apr 20, 2022

rouson commented Apr 20, 2022

rouson commented Apr 20, 2022

milancurcic commented Apr 20, 2022

rouson commented Apr 21, 2022 • edited Loading

milancurcic commented Apr 25, 2022

rouson commented Apr 25, 2022

milancurcic commented Jul 14, 2023

rouson commented Apr 21, 2022 •

edited

Loading