14 Dec, 2009

1 commit


11 Dec, 2009

1 commit


10 Dec, 2009

1 commit


09 Dec, 2009

2 commits


25 Nov, 2009

1 commit


18 Nov, 2009

1 commit

  • This problem happens in the following conditions :
    - use Directory Servers that do not have Replication Servers in the same JVM
    - use only 2 Replication Servers
    - apply a heavy load of updates on one Directory Server
    - stop the first Replication Server
    - wait some time long enough to perform millions of change
    - Restart the First Replication Server that will therefore have millions of
     change to retrieve from the second
    - quickly stop the second Replication Server (before it has time to replicate
     the missing changes to the first RS)
    
    In such case, The DS will connect to the first RS, see that it missing lots of change and will attempt to re-generate them from the historical information
    in the database. Unfortunately this process needs to fetch all the changes
    in memory because it needs to send them to the RS in the order of the
    ChangeNumbers and therefore currently sort them in memory before sending them.
    
    This change fixes the problem by searching for changes by interval. This avoid the memory
    problem because in this case, there is only the need to sort a limited number of changes and
    this can fit in memory.
    
    However this fix is not enough because this whole process is done in the replication Listener thread and this thread is also responsible for managing the replication protocol window.
    Unfortunately while this thread is busy sending a lot of changes to the RS it is not able to also do the job of managing the window and this can therefore fall into a deadlock.
    
    So a second level of changes is necessary to move the code in a separated new thread that is
    created only when necessary.
    
    This lead to the last problem that I met : the creation of this new thread caused some concurrency
    problems that I had to fix by introducing some synchronization code between this new thread, the listener thread and the worker thread. 
    
    git-svn-id: https://svn.forgerock.org/opendj/trunk@6161 41b1ffd8-f28e-4786-ab96-9950f0a78031
    gbellato
     

10 Nov, 2009

1 commit


06 Nov, 2009

1 commit

  • we introduce:
    
    
    
    - a weigth, which is an integer affected to each RS that combined with 
    each others will define a percentage value which matches the number of 
    DSs (compared with total number od DSs in the topology) that can be 
    connected to the RS at a time in the topology. In these modif, this 
    configuration of the weight is added as well as dynamic changes. Also 
    transported in Topo messages. No modification of the connection 
    algorithm yet
    
    
    
    - Also to support the future connection algorithm, these modifs 
    introduces a Monitoring Publisher thread which is a thread that sens 
    every 3 seconds a Monitoring message (unchanged format) to every DSs 
    that are connected to him. These information will be used by the DSs to 
    potentially reconnect to another RSs with a newer server state (info 
    included in monitoring messages)
    
    
    
    The new connection algorithm will take into account:
    
    - group id
    
    - generation id
    
    - server states
    
    - locality (same VM)
    
    - weight (load)
    
    
    
    git-svn-id: https://svn.forgerock.org/opendj/trunk@6097 41b1ffd8-f28e-4786-ab96-9950f0a78031
    mrossign
     

28 Oct, 2009

1 commit


27 Oct, 2009

2 commits

  • repetitively broker sessions to the Replication Server which can cause some
    normal failure.
    
    The problem that is fixed here was was that the tests didn't attempt to connect again.
    
    git-svn-id: https://svn.forgerock.org/opendj/trunk@6056 41b1ffd8-f28e-4786-ab96-9950f0a78031
    gbellato
     
  • Several changes are included in this diff that allows replicaiton shutdown to
    happen more quickly
    
    - The Replication Server dbHandler thread was synchronized on the wrong variable
    - During shutdown Topo messages were sent to all the other RS by the RS
     that is shutdown.
    - There was a left sleep in the ReplicationServer creation that is not necessary anymore. 
    
    git-svn-id: https://svn.forgerock.org/opendj/trunk@6054 41b1ffd8-f28e-4786-ab96-9950f0a78031
    gbellato
     

26 Oct, 2009

3 commits


22 Oct, 2009

2 commits


20 Oct, 2009

2 commits


19 Oct, 2009

4 commits


16 Oct, 2009

1 commit

  • This issue happens when a new OpenDS server is added to a topology which has been up and
    running for some time and the new server is both a Replication Server and a Directory Server.
    
    In such cases the Replication Server starts empty and is therefore very late with regards to
    the other servers.
    The Directory Server is initialized from an up to date DS already in the topology and is therefore
    not late.
    
    The problem is that the new DS incorrectly choose the RS that was just installed to be provided
    with all the new changes in the topology.
    
    The DS therefore as to wait for the RS to grab all the old changes before being provided with the
    new change.
    
    The fix is to change the algorithm that is used by the DS to select the RS and to give priority
    to the RS that are more up to date.
    
    Gary would like to see this in the 2.2 branch and would therefore have to be back-ported there.
    
    I have also added unit tests for this case and other similar ones.
    replica DS after init
    
    git-svn-id: https://svn.forgerock.org/opendj/trunk@5993 41b1ffd8-f28e-4786-ab96-9950f0a78031
    gbellato
     

15 Oct, 2009

1 commit


13 Oct, 2009

2 commits


08 Oct, 2009

3 commits

  • in handshake phase instead of a ReplServerStartMsg. ReplServerStartDSMsg 
    contains same thing as ReplServerStartMsg but also contains
    
      - replication server weight
    
      - number of ciurrently connected DS on the RS
    
      => both will be used for future new RS choice algorithm
    
    
    
    - Addition of a StopMsg sent:
    
      - when any entity (DS,RS) is closing a connection (sent just before) 
    with a peer
    
      - when DS finishes phase 1 of handshake (was gathering RS info for RS 
    choice so sent just after new ReplServerStartDSMsg is received)
    
      => both are used to distinguish between a proper connection closure 
    (no message) and an unexpected one (error log)
    
    
    
    - Compatibility between protocol V4 and V3 (and before)
    
      - changed MonitorMsg to never be created with a protocol version
    
      - MonitorMsg now always sent with publish(msg, version) (publish 
    method without version was used so bug)
    
      - TopologyMsg now always sent with publish(msg, version) (publish 
    method without version was used so bug)
    
    
    
    git-svn-id: https://svn.forgerock.org/opendj/trunk@5950 41b1ffd8-f28e-4786-ab96-9950f0a78031
    mrossign
     
  • git-svn-id: https://svn.forgerock.org/opendj/trunk@5949 41b1ffd8-f28e-4786-ab96-9950f0a78031
    pgamba
     
  • git-svn-id: https://svn.forgerock.org/opendj/trunk@5942 41b1ffd8-f28e-4786-ab96-9950f0a78031
    pgamba
     

07 Oct, 2009

1 commit

  • value of 65535 for the server-id property. Nevertheless, the server-id in the
    ReplicationDomain implementation is managed as a short allowing a maximum value
    of 32767.
    
    With this change the code now uses an int to store the server-id.
    The maximum is still limited to 65535 and this is enforced by the management framework.
    
    This change should not impact compatibility as the messages exchanged by the servers
    are not impacted.
    
    This change also add unit tests allowing to test the compatibility of the V4 protocol
    with V3 protocol.
    
    git-svn-id: https://svn.forgerock.org/opendj/trunk@5936 41b1ffd8-f28e-4786-ab96-9950f0a78031
    gbellato
     

06 Oct, 2009

1 commit


05 Oct, 2009

1 commit


04 Oct, 2009

1 commit


03 Oct, 2009

1 commit


02 Oct, 2009

2 commits


30 Sep, 2009

3 commits