Re: waiting on UNBLOCK after connect timed out

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: waiting on UNBLOCK after connect timed out

Bela Ban
I'm trying to reproduce this. Can you attach your config ? I assume it
is tcp.xml (TCP:TCPPING), correct ?

And you use JBoss to run this ? I'll try with the simple ViewDemo and
your config first...

How do you reproduce this ? Just randomly killing and restarting a
member from a group of 3 ?

We're currently simplifying FLUSH (in 2.6), can you use a snapshot of
2.6 (ready in ca 1 week) to see whether the issue still occurs ?

Brian Campbell wrote:

>> We'd still like to look into this a bit more though, to see
>> whether you've detected a bug or not. One important tool to
>> do so would be a stack trace *for all 3 members* when this
>> happens again. This way, we can see whether we're blocked
>> (for example) in a block() callback (application code).
>>
>> --
>> Bela Ban
>> Lead JGroups / Clustering Team
>> JBoss - a division of Red Hat
>>
>>    
>
> Sorry for the slow response - I've been on a short vacation.  But now
> that I'm back, attached are logs that contain a full "kill -SIGQUIT"
> thread dump from the three servers.  103 came up first then 104 and 105.
> I bounced 104 and 105 a few times (only one at a time though) and
> eventually on startup 104 gets the "waiting on UNBLOCK after connect
> timed out" message followed by "FLUSH block at 172.16.19.104:7600 for
> ever."   At that point I took the thread dump on all three.
>
> Sometimes this occurs right away and sometimes I have to bounce many
> times.  Not sure if that means anything.
>
> FWIW, I don't have anything (at least I don't think I do) in a block()
> callback other than some simple logging.
>
>
>
>  

--
Bela Ban
Lead JGroups / Clustering Team
JBoss - a division of Red Hat


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
javagroups-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/javagroups-users
Reply | Threaded
Open this post in threaded view
|

Re: waiting on UNBLOCK after connect timed out

Bela Ban
Another tip: don't use TCP_NIO, use TCP instead. TCP_NIO has not been
tested as well as TCP, it is for example not part of our cruise control
tests, whereas TCP and UDP are.

And, if you can use IP multicasting, I recommend TCP:MPING, which uses
IP multicasting for discovery and TCP for sending messages. If you can
do this, you can also set the start_port in TCP to 0, so the OS picks
the ports, and we can avoid running into the reincarnation issue.

Brian Campbell wrote:

>> We'd still like to look into this a bit more though, to see
>> whether you've detected a bug or not. One important tool to
>> do so would be a stack trace *for all 3 members* when this
>> happens again. This way, we can see whether we're blocked
>> (for example) in a block() callback (application code).
>>
>> --
>> Bela Ban
>> Lead JGroups / Clustering Team
>> JBoss - a division of Red Hat
>>
>>    
>
> Sorry for the slow response - I've been on a short vacation.  But now
> that I'm back, attached are logs that contain a full "kill -SIGQUIT"
> thread dump from the three servers.  103 came up first then 104 and 105.
> I bounced 104 and 105 a few times (only one at a time though) and
> eventually on startup 104 gets the "waiting on UNBLOCK after connect
> timed out" message followed by "FLUSH block at 172.16.19.104:7600 for
> ever."   At that point I took the thread dump on all three.
>
> Sometimes this occurs right away and sometimes I have to bounce many
> times.  Not sure if that means anything.
>
> FWIW, I don't have anything (at least I don't think I do) in a block()
> callback other than some simple logging.
>
>
>
>  

--
Bela Ban
Lead JGroups / Clustering Team
JBoss - a division of Red Hat


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
javagroups-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/javagroups-users
Reply | Threaded
Open this post in threaded view
|

Re: waiting on UNBLOCK after connect timed out

Brian Campbell-7
In reply to this post by Bela Ban
And this time it didn't like .zip.  Trying again with tar gz

> -----Original Message-----
> From: Brian Campbell
> Sent: Tuesday, October 02, 2007 11:04 AM
> To: Brian Campbell; Bela Ban
> Cc: [hidden email]
> Subject: RE: [javagroups-users] waiting on UNBLOCK after
> connect timed out
>
> Sorry, source forge bounced my last post because the
> attachment was too big.  This time it's a source only
> attachment but you'll need to add JG 2.5, log4j and commons
> logging jars into the lib directory.
>
> > -----Original Message-----
> > From: Brian Campbell
> > Sent: Tuesday, October 02, 2007 11:01 AM
> > To: 'Bela Ban'
> > Cc: [hidden email]
> > Subject: RE: [javagroups-users] waiting on UNBLOCK after
> connect timed
> > out
> >
> > I thought I had included my config on an earlier post but
> maybe not.  
> > Anyway, we are using Jboss/Jetty as a servlet container but
> I've been
> > able to reproduce the issue outside of that container.
> >
> > I tried to isolate the way our app uses jgroups and pulled
> it out into
> > a little application (which I've attached). You can build
> it with ant.  
> > Then in the build dir you can run it with go.sh.  
> > cluster-protocol-stacks.xml has the stack config.
> > props.props has some properties that further define the
> config (like
> > which stack to use from the xml file).
> >
> > Using this little app I've been able to reproduce the issue
> much the
> > same as with our full application.  Like you said -
> randomly killing
> > and restarting a member from a group of three (sometimes it even
> > happens on initial starting of the 3).  I've been running the tests
> > using three independent machines each of which has two NICs.  I'm
> > using the props.props file to bind jgroups to the secondary
> NICs which
> > are all on an isolated VLAN.
> >  
> >
> > > -----Original Message-----
> > > From: Bela Ban [mailto:[hidden email]]
> > > Sent: Tuesday, October 02, 2007 12:53 AM
> > > To: Brian Campbell
> > > Cc: [hidden email]
> > > Subject: Re: [javagroups-users] waiting on UNBLOCK after
> > connect timed
> > > out
> > >
> > > I'm trying to reproduce this. Can you attach your config ?
> > I assume it
> > > is tcp.xml (TCP:TCPPING), correct ?
> > >
> > > And you use JBoss to run this ? I'll try with the simple
> > ViewDemo and
> > > your config first...
> > >
> > > How do you reproduce this ? Just randomly killing and
> restarting a
> > > member from a group of 3 ?
> > >
> > > We're currently simplifying FLUSH (in 2.6), can you use a
> > snapshot of
> > > 2.6 (ready in ca 1 week) to see whether the issue still occurs ?
> > >
> > > Brian Campbell wrote:
> > > >> We'd still like to look into this a bit more though, to
> > > see whether
> > > >> you've detected a bug or not. One important tool to do so
> > > would be a
> > > >> stack trace *for all 3 members* when this happens again.
> > > This way, we
> > > >> can see whether we're blocked (for example) in a block()
> > callback
> > > >> (application code).
> > > >>
> > > >> --
> > > >> Bela Ban
> > > >> Lead JGroups / Clustering Team
> > > >> JBoss - a division of Red Hat
> > > >>
> > > >>    
> > > >
> > > > Sorry for the slow response - I've been on a short
> > > vacation.  But now
> > > > that I'm back, attached are logs that contain a full
> > "kill -SIGQUIT"
> > > > thread dump from the three servers.  103 came up first then
> > > 104 and 105.
> > > > I bounced 104 and 105 a few times (only one at a time
> though) and
> > > > eventually on startup 104 gets the "waiting on UNBLOCK
> > > after connect
> > > > timed out" message followed by "FLUSH block at
> > > 172.16.19.104:7600 for
> > > > ever."   At that point I took the thread dump on all three.
> > > >
> > > > Sometimes this occurs right away and sometimes I have to
> > > bounce many
> > > > times.  Not sure if that means anything.
> > > >
> > > > FWIW, I don't have anything (at least I don't think I do)
> > > in a block()
> > > > callback other than some simple logging.
> > > >
> > > >
> > > >
> > > >  
> > >
> > > --
> > > Bela Ban
> > > Lead JGroups / Clustering Team
> > > JBoss - a division of Red Hat
> > >
> > >

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
javagroups-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/javagroups-users

jgtest.tar.gz (18K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: waiting on UNBLOCK after connect timed out

Bela Ban
In reply to this post by Bela Ban
OK. I see your config uses UDP:PING, but in your stack traces I see
TCP_NIO. Is this because you had another cluster running on TCP_NIO,
besides the UDP cluster, or because you switched from TCP_NIO to UDP ?

I'll take a look at the demo and try to reproduce it here.

Brian Campbell wrote:

> I thought I had included my config on an earlier post but maybe not.
> Anyway, we are using Jboss/Jetty as a servlet container but I've been
> able to reproduce the issue outside of that container.
>
> I tried to isolate the way our app uses jgroups and pulled it out into a
> little application (which I've attached). You can build it with ant.
> Then in the build dir you can run it with go.sh.
> cluster-protocol-stacks.xml has the stack config.  props.props has some
> properties that further define the config (like which stack to use from
> the xml file).
>
> Using this little app I've been able to reproduce the issue much the
> same as with our full application.  Like you said - randomly killing and
> restarting a member from a group of three (sometimes it even happens on
> initial starting of the 3).  I've been running the tests using three
> independent machines each of which has two NICs.  I'm using the
> props.props file to bind jgroups to the secondary NICs which are all on
> an isolated VLAN.  
>  
>
>  
>> -----Original Message-----
>> From: Bela Ban [mailto:[hidden email]]
>> Sent: Tuesday, October 02, 2007 12:53 AM
>> To: Brian Campbell
>> Cc: [hidden email]
>> Subject: Re: [javagroups-users] waiting on UNBLOCK after
>> connect timed out
>>
>> I'm trying to reproduce this. Can you attach your config ? I
>> assume it is tcp.xml (TCP:TCPPING), correct ?
>>
>> And you use JBoss to run this ? I'll try with the simple
>> ViewDemo and your config first...
>>
>> How do you reproduce this ? Just randomly killing and
>> restarting a member from a group of 3 ?
>>
>> We're currently simplifying FLUSH (in 2.6), can you use a snapshot of
>> 2.6 (ready in ca 1 week) to see whether the issue still occurs ?
>>
>> Brian Campbell wrote:
>>    
>>>> We'd still like to look into this a bit more though, to
>>>>        
>> see whether
>>    
>>>> you've detected a bug or not. One important tool to do so
>>>>        
>> would be a
>>    
>>>> stack trace *for all 3 members* when this happens again.
>>>>        
>> This way, we
>>    
>>>> can see whether we're blocked (for example) in a block() callback
>>>> (application code).
>>>>
>>>> --
>>>> Bela Ban
>>>> Lead JGroups / Clustering Team
>>>> JBoss - a division of Red Hat
>>>>
>>>>    
>>>>        
>>> Sorry for the slow response - I've been on a short
>>>      
>> vacation.  But now
>>    
>>> that I'm back, attached are logs that contain a full "kill -SIGQUIT"
>>> thread dump from the three servers.  103 came up first then
>>>      
>> 104 and 105.
>>    
>>> I bounced 104 and 105 a few times (only one at a time though) and
>>> eventually on startup 104 gets the "waiting on UNBLOCK
>>>      
>> after connect
>>    
>>> timed out" message followed by "FLUSH block at
>>>      
>> 172.16.19.104:7600 for
>>    
>>> ever."   At that point I took the thread dump on all three.
>>>
>>> Sometimes this occurs right away and sometimes I have to
>>>      
>> bounce many
>>    
>>> times.  Not sure if that means anything.
>>>
>>> FWIW, I don't have anything (at least I don't think I do)
>>>      
>> in a block()
>>    
>>> callback other than some simple logging.
>>>
>>>
>>>
>>>  
>>>      
>> --
>> Bela Ban
>> Lead JGroups / Clustering Team
>> JBoss - a division of Red Hat
>>
>>
>>    

--
Bela Ban
Lead JGroups / Clustering Team
JBoss - a division of Red Hat


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
javagroups-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/javagroups-users
Reply | Threaded
Open this post in threaded view
|

Re: waiting on UNBLOCK after connect timed out

Bela Ban
In reply to this post by Bela Ban
I created http://jira.jboss.com/jira/browse/JGRP-603 to look into this.

Brian Campbell wrote:

> I thought I had included my config on an earlier post but maybe not.
> Anyway, we are using Jboss/Jetty as a servlet container but I've been
> able to reproduce the issue outside of that container.
>
> I tried to isolate the way our app uses jgroups and pulled it out into a
> little application (which I've attached). You can build it with ant.
> Then in the build dir you can run it with go.sh.
> cluster-protocol-stacks.xml has the stack config.  props.props has some
> properties that further define the config (like which stack to use from
> the xml file).
>
> Using this little app I've been able to reproduce the issue much the
> same as with our full application.  Like you said - randomly killing and
> restarting a member from a group of three (sometimes it even happens on
> initial starting of the 3).  I've been running the tests using three
> independent machines each of which has two NICs.  I'm using the
> props.props file to bind jgroups to the secondary NICs which are all on
> an isolated VLAN.  
>  
>
>  
>> -----Original Message-----
>> From: Bela Ban [mailto:[hidden email]]
>> Sent: Tuesday, October 02, 2007 12:53 AM
>> To: Brian Campbell
>> Cc: [hidden email]
>> Subject: Re: [javagroups-users] waiting on UNBLOCK after
>> connect timed out
>>
>> I'm trying to reproduce this. Can you attach your config ? I
>> assume it is tcp.xml (TCP:TCPPING), correct ?
>>
>> And you use JBoss to run this ? I'll try with the simple
>> ViewDemo and your config first...
>>
>> How do you reproduce this ? Just randomly killing and
>> restarting a member from a group of 3 ?
>>
>> We're currently simplifying FLUSH (in 2.6), can you use a snapshot of
>> 2.6 (ready in ca 1 week) to see whether the issue still occurs ?
>>
>> Brian Campbell wrote:
>>    
>>>> We'd still like to look into this a bit more though, to
>>>>        
>> see whether
>>    
>>>> you've detected a bug or not. One important tool to do so
>>>>        
>> would be a
>>    
>>>> stack trace *for all 3 members* when this happens again.
>>>>        
>> This way, we
>>    
>>>> can see whether we're blocked (for example) in a block() callback
>>>> (application code).
>>>>
>>>> --
>>>> Bela Ban
>>>> Lead JGroups / Clustering Team
>>>> JBoss - a division of Red Hat
>>>>
>>>>    
>>>>        
>>> Sorry for the slow response - I've been on a short
>>>      
>> vacation.  But now
>>    
>>> that I'm back, attached are logs that contain a full "kill -SIGQUIT"
>>> thread dump from the three servers.  103 came up first then
>>>      
>> 104 and 105.
>>    
>>> I bounced 104 and 105 a few times (only one at a time though) and
>>> eventually on startup 104 gets the "waiting on UNBLOCK
>>>      
>> after connect
>>    
>>> timed out" message followed by "FLUSH block at
>>>      
>> 172.16.19.104:7600 for
>>    
>>> ever."   At that point I took the thread dump on all three.
>>>
>>> Sometimes this occurs right away and sometimes I have to
>>>      
>> bounce many
>>    
>>> times.  Not sure if that means anything.
>>>
>>> FWIW, I don't have anything (at least I don't think I do)
>>>      
>> in a block()
>>    
>>> callback other than some simple logging.
>>>
>>>
>>>
>>>  
>>>      
>> --
>> Bela Ban
>> Lead JGroups / Clustering Team
>> JBoss - a division of Red Hat
>>
>>
>>    

--
Bela Ban
Lead JGroups / Clustering Team
JBoss - a division of Red Hat


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
javagroups-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/javagroups-users
Reply | Threaded
Open this post in threaded view
|

Re: waiting on UNBLOCK after connect timed out

Brian Campbell-7
I'm not sure I understand the comments in the issue report.  Are you
saying that it's just a config issue and that setting
suspect_on_send_failure="true" for TCP/TCP_NIO should do the trick?  

I've been unable to reproduce it recently with UDP but I am reasonably
certain that one of the first times we saw the problem was when testing
w/ UDP.

> -----Original Message-----
> From: Bela Ban [mailto:[hidden email]]
> Sent: Wednesday, October 03, 2007 2:34 AM
> To: Brian Campbell
> Cc: [hidden email]
> Subject: Re: [javagroups-users] waiting on UNBLOCK after
> connect timed out
>
> I created http://jira.jboss.com/jira/browse/JGRP-603 to look
> into this.
>
> Brian Campbell wrote:
> > I thought I had included my config on an earlier post but maybe not.
> > Anyway, we are using Jboss/Jetty as a servlet container but
> I've been
> > able to reproduce the issue outside of that container.
> >
> > I tried to isolate the way our app uses jgroups and pulled
> it out into
> > a little application (which I've attached). You can build
> it with ant.
> > Then in the build dir you can run it with go.sh.
> > cluster-protocol-stacks.xml has the stack config.  props.props has
> > some properties that further define the config (like which stack to
> > use from the xml file).
> >
> > Using this little app I've been able to reproduce the issue
> much the
> > same as with our full application.  Like you said -
> randomly killing
> > and restarting a member from a group of three (sometimes it even
> > happens on initial starting of the 3).  I've been running the tests
> > using three independent machines each of which has two NICs.  I'm
> > using the props.props file to bind jgroups to the secondary
> NICs which
> > are all on an isolated VLAN.
> >  
> >
> >  
> >> -----Original Message-----
> >> From: Bela Ban [mailto:[hidden email]]
> >> Sent: Tuesday, October 02, 2007 12:53 AM
> >> To: Brian Campbell
> >> Cc: [hidden email]
> >> Subject: Re: [javagroups-users] waiting on UNBLOCK after connect
> >> timed out
> >>
> >> I'm trying to reproduce this. Can you attach your config ?
> I assume
> >> it is tcp.xml (TCP:TCPPING), correct ?
> >>
> >> And you use JBoss to run this ? I'll try with the simple
> ViewDemo and
> >> your config first...
> >>
> >> How do you reproduce this ? Just randomly killing and restarting a
> >> member from a group of 3 ?
> >>
> >> We're currently simplifying FLUSH (in 2.6), can you use a
> snapshot of
> >> 2.6 (ready in ca 1 week) to see whether the issue still occurs ?
> >>
> >> Brian Campbell wrote:
> >>    
> >>>> We'd still like to look into this a bit more though, to
> >>>>        
> >> see whether
> >>    
> >>>> you've detected a bug or not. One important tool to do so
> >>>>        
> >> would be a
> >>    
> >>>> stack trace *for all 3 members* when this happens again.
> >>>>        
> >> This way, we
> >>    
> >>>> can see whether we're blocked (for example) in a block()
> callback
> >>>> (application code).
> >>>>
> >>>> --
> >>>> Bela Ban
> >>>> Lead JGroups / Clustering Team
> >>>> JBoss - a division of Red Hat
> >>>>
> >>>>    
> >>>>        
> >>> Sorry for the slow response - I've been on a short
> >>>      
> >> vacation.  But now
> >>    
> >>> that I'm back, attached are logs that contain a full
> "kill -SIGQUIT"
> >>> thread dump from the three servers.  103 came up first then
> >>>      
> >> 104 and 105.
> >>    
> >>> I bounced 104 and 105 a few times (only one at a time though) and
> >>> eventually on startup 104 gets the "waiting on UNBLOCK
> >>>      
> >> after connect
> >>    
> >>> timed out" message followed by "FLUSH block at
> >>>      
> >> 172.16.19.104:7600 for
> >>    
> >>> ever."   At that point I took the thread dump on all three.
> >>>
> >>> Sometimes this occurs right away and sometimes I have to
> >>>      
> >> bounce many
> >>    
> >>> times.  Not sure if that means anything.
> >>>
> >>> FWIW, I don't have anything (at least I don't think I do)
> >>>      
> >> in a block()
> >>    
> >>> callback other than some simple logging.
> >>>
> >>>
> >>>
> >>>  
> >>>      
> >> --
> >> Bela Ban
> >> Lead JGroups / Clustering Team
> >> JBoss - a division of Red Hat
> >>
> >>
> >>    
>
> --
> Bela Ban
> Lead JGroups / Clustering Team
> JBoss - a division of Red Hat
>
>

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
javagroups-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/javagroups-users
Reply | Threaded
Open this post in threaded view
|

Re: waiting on UNBLOCK after connect timed out

Bela Ban
#1 Don't use TCP_NIO
#2 If you must use TCP, set suspect_on_send_failure to true
#3 We're working on a much improved FLUSH protocol, and are getting rid
of 2 phases. This will simplify FLUSH dramatically, and make it faster
too. Fewer code too, so fewer chances of bugs.

Brian Campbell wrote:

> I'm not sure I understand the comments in the issue report.  Are you
> saying that it's just a config issue and that setting
> suspect_on_send_failure="true" for TCP/TCP_NIO should do the trick?  
>
> I've been unable to reproduce it recently with UDP but I am reasonably
> certain that one of the first times we saw the problem was when testing
> w/ UDP.
>
>  
>> -----Original Message-----
>> From: Bela Ban [mailto:[hidden email]]
>> Sent: Wednesday, October 03, 2007 2:34 AM
>> To: Brian Campbell
>> Cc: [hidden email]
>> Subject: Re: [javagroups-users] waiting on UNBLOCK after
>> connect timed out
>>
>> I created http://jira.jboss.com/jira/browse/JGRP-603 to look
>> into this.
>>
>> Brian Campbell wrote:
>>    
>>> I thought I had included my config on an earlier post but maybe not.
>>> Anyway, we are using Jboss/Jetty as a servlet container but
>>>      
>> I've been
>>    
>>> able to reproduce the issue outside of that container.
>>>
>>> I tried to isolate the way our app uses jgroups and pulled
>>>      
>> it out into
>>    
>>> a little application (which I've attached). You can build
>>>      
>> it with ant.
>>    
>>> Then in the build dir you can run it with go.sh.
>>> cluster-protocol-stacks.xml has the stack config.  props.props has
>>> some properties that further define the config (like which stack to
>>> use from the xml file).
>>>
>>> Using this little app I've been able to reproduce the issue
>>>      
>> much the
>>    
>>> same as with our full application.  Like you said -
>>>      
>> randomly killing
>>    
>>> and restarting a member from a group of three (sometimes it even
>>> happens on initial starting of the 3).  I've been running the tests
>>> using three independent machines each of which has two NICs.  I'm
>>> using the props.props file to bind jgroups to the secondary
>>>      
>> NICs which
>>    
>>> are all on an isolated VLAN.
>>>  
>>>
>>>  
>>>      
>>>> -----Original Message-----
>>>> From: Bela Ban [mailto:[hidden email]]
>>>> Sent: Tuesday, October 02, 2007 12:53 AM
>>>> To: Brian Campbell
>>>> Cc: [hidden email]
>>>> Subject: Re: [javagroups-users] waiting on UNBLOCK after connect
>>>> timed out
>>>>
>>>> I'm trying to reproduce this. Can you attach your config ?
>>>>        
>> I assume
>>    
>>>> it is tcp.xml (TCP:TCPPING), correct ?
>>>>
>>>> And you use JBoss to run this ? I'll try with the simple
>>>>        
>> ViewDemo and
>>    
>>>> your config first...
>>>>
>>>> How do you reproduce this ? Just randomly killing and restarting a
>>>> member from a group of 3 ?
>>>>
>>>> We're currently simplifying FLUSH (in 2.6), can you use a
>>>>        
>> snapshot of
>>    
>>>> 2.6 (ready in ca 1 week) to see whether the issue still occurs ?
>>>>
>>>> Brian Campbell wrote:
>>>>    
>>>>        
>>>>>> We'd still like to look into this a bit more though, to
>>>>>>        
>>>>>>            
>>>> see whether
>>>>    
>>>>        
>>>>>> you've detected a bug or not. One important tool to do so
>>>>>>        
>>>>>>            
>>>> would be a
>>>>    
>>>>        
>>>>>> stack trace *for all 3 members* when this happens again.
>>>>>>        
>>>>>>            
>>>> This way, we
>>>>    
>>>>        
>>>>>> can see whether we're blocked (for example) in a block()
>>>>>>            
>> callback
>>    
>>>>>> (application code).
>>>>>>
>>>>>> --
>>>>>> Bela Ban
>>>>>> Lead JGroups / Clustering Team
>>>>>> JBoss - a division of Red Hat
>>>>>>
>>>>>>    
>>>>>>        
>>>>>>            
>>>>> Sorry for the slow response - I've been on a short
>>>>>      
>>>>>          
>>>> vacation.  But now
>>>>    
>>>>        
>>>>> that I'm back, attached are logs that contain a full
>>>>>          
>> "kill -SIGQUIT"
>>    
>>>>> thread dump from the three servers.  103 came up first then
>>>>>      
>>>>>          
>>>> 104 and 105.
>>>>    
>>>>        
>>>>> I bounced 104 and 105 a few times (only one at a time though) and
>>>>> eventually on startup 104 gets the "waiting on UNBLOCK
>>>>>      
>>>>>          
>>>> after connect
>>>>    
>>>>        
>>>>> timed out" message followed by "FLUSH block at
>>>>>      
>>>>>          
>>>> 172.16.19.104:7600 for
>>>>    
>>>>        
>>>>> ever."   At that point I took the thread dump on all three.
>>>>>
>>>>> Sometimes this occurs right away and sometimes I have to
>>>>>      
>>>>>          
>>>> bounce many
>>>>    
>>>>        
>>>>> times.  Not sure if that means anything.
>>>>>
>>>>> FWIW, I don't have anything (at least I don't think I do)
>>>>>      
>>>>>          
>>>> in a block()
>>>>    
>>>>        
>>>>> callback other than some simple logging.
>>>>>
>>>>>
>>>>>
>>>>>  
>>>>>      
>>>>>          
>>>> --
>>>> Bela Ban
>>>> Lead JGroups / Clustering Team
>>>> JBoss - a division of Red Hat
>>>>
>>>>
>>>>    
>>>>        
>> --
>> Bela Ban
>> Lead JGroups / Clustering Team
>> JBoss - a division of Red Hat
>>
>>
>>    
>
>  

--
Bela Ban
Lead JGroups / Clustering Team
JBoss - a division of Red Hat


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
javagroups-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/javagroups-users