Search This Blog

Re: [mobicents-public] Mobicents weekly meeting notes, March 30, 2011

That's correct. Thank you, Bartek.

On Wed, Mar 30, 2011 at 2:25 PM, Bartosz Baranowski <baranowb@gmail.com> wrote:
> /core-chat
> @Meta-Inf
>  - Eduardo
>  + Amit
>  :)
> Bartosz Baranowski
> JBoss R & D
> ==================================
> Word of criticism meant to improve is always step forward.
>
>
> On Wed, Mar 30, 2011 at 8:16 PM, Ivelin Ivanov
> <ivelin.atanasoff.ivanov@gmail.com> wrote:
>>
>> Attendees:
>>  Alex, Bartek, Eduardo, Jean, Luis, Oleg, Pavel, Shay, Vladimir, Ivelin
>>
>> Summary:
>>  #1 JBCP 1.2.11 code freeze in 2 weeks; 5.1.CR2 in QE
>>  #2 MMS 2.x, JSR 309, new scheduler integration, load tests: new
>> scheduler solves some of the blocking problems, new RTP streaming
>> problem
>>  #3 MSS 1.6 more bug fixes; 2.0 progress with cloud packaging and CTF
>>  #4 Diameter 1.4: Beta released, HA tests and TLS next
>>  #5 JSLEE 2.4.CR1 released; big update in EclipSLEE
>>  #6 Presence/RCS: porting to latest JSLEE
>>  #7 SS7: working on Dialogic integration, finalizing MAP for B8
>> release, tested USSD simulation;
>>  #8 QE: JBCP 5.1 testing; hudson scripts for EclipSLEE; tuning perf
>> regression parameters
>>
>>
>> Log:
>> -----------
>> [10:29] <ivelin> lol, iPhone still has daylight time change bugs
>> [10:29] <ivelin> its such a mess
>> [10:30] <ivelin> there are even half time zones in the US and I think
>> India
>> [10:30] <ivelin> Pavel, take the stage
>> [10:30] <slegrik> ok
>> [10:30] <slegrik> #1
>> [10:30] <slegrik> JBCP-5.1.0.GA - CR02 is in QA, there was foud a
>> wrong config file issue, so this has been resolved immediatly
>> [10:31] <slegrik> Luis might update here, how testing goes and when he
>> expects sign off
>> [10:31] <slegrik> JBCP-1.2.11 - There is planed codefreeze in 2 weeks,
>> so still in fixing issues phase
>> [10:31] <slegrik> Besides I have updated JBCP release schedule, will
>> post on jbcp-dev till end of this week. Have prepared preso and demo
>> for
>> [10:31] <slegrik> Open House in Brno. Went over the EAP & SOA
>> Productization teams meeting minutes and conclusions from Westford
>> productization gathering
>> [10:32] <slegrik> barreiro, want to update on JBCP-5.1.0.CR02 QA status ?
>> [10:32] <barreiro> sure.
>> [10:32] <barreiro> I put up a test plan at
>> https://docspace.corp.redhat.com/docs/DOC-62021
>> [10:33] <barreiro> looks a bit empty, but I plan to have everything
>> ready during the beggining of next week.
>> [10:36] <slegrik> that's all from me, in cease of no questions
>> [10:37] <ivelin> ok, #2
>> [10:37] <ivelin> #2 MMS 2.x, JSR 309, new scheduler integration, load
>> tests
>> [10:38] <oleg__> I have two news
>> [10:38] <oleg__> one bad and one good
>> [10:39] <oleg__> the bad is that I hang on RTP connection tests for
>> the whole week and I don't have a complete solution yet
>> [10:39] <oleg__> probably that the problem is OS or JVM because the
>> same tests on Vladimir's machine passed always
>> [10:40] <jeand> RTP connection tests of JSR 309 ?
>> [10:40] <oleg__> however I need to double check everything that it is
>> not the bug inside code
>> [10:40] <oleg__> Jean, no. for MMS core
>> [10:41] <jeand> what's your OS ?
>> [10:41] <oleg__> what is good... Vladimir created wireshark trace for
>> RTP exhange and it gives "external" view on scheduler
>> [10:42] <oleg__> the result is as follows:
>> [10:42] <oleg__> Max delta = 0,00 ms at packet no. 0  Max jitter =
>> 0,00 ms. Mean jitter = 0,00 ms. Max skew = 0,00 ms. Total RTP packets
>> = 255   (expected 255)   Lost RTP packets = 0 (0,00%)   Sequence
>> errors = 0  Duration 5,09 s (0 ms clock drift, corresponding to 1 Hz
>> (+0,00%)
>> [10:42] <oleg__> Jean, windows
>> [10:42] <oleg__> so it is so perfect so I still don'e belive into it
>> [10:43] <vralev> i dont understand how i could be so wrong on your side
>> [10:43] <vralev> it
>> [10:43] <oleg__> Vladimir, on my side the failre rate > 50%
>> [10:44] <vralev> well, it is just too strange
>> [10:44] <oleg__> I can't create trace on windows using local interface
>> so I am monitoring number of packets in check points and spectra of
>> signal
>> [10:44] <vralev> don't use windows
>> [10:45] <jeand> is this wireshark on localhost vralev ?
>> [10:45] <vralev> jeand: yes
>> [10:45] <oleg__> what is more strange that I am detecting (if it
>> happens) loses only on rx leg on second connection always
>> [10:45] <ivelin> yes, Oleg, try Linux. We are spending so much time on
>> problems that don't show up on Vladimir's machine.
>> [10:45] <jeand> might be worth a try on a real network ?
>> [10:46] <oleg__> Ivelin, it may indicator of problem
>> [10:46] <vralev> jeand: not worth it right now
>> [10:46] <oleg__> I would preffer to confirm that this is OS specific
>> problem and have explanation for it
>> [10:47] <oleg__> anyway most important thing that real time scheduling
>> works!
>> [10:47] <jeand> oleg__, if you try linux and still see the problem
>> [10:47] <oleg__> just to comapre with previous streams:
>> [10:47] <jeand> you will know if it's OS specific
>> [10:47] <jeand> or not
>> [10:48] <oleg__> Max delta = 21,99 ms at packet no. 1293  Max jitter =
>> 1,08 ms. Mean jitter = 0,76 ms. Max skew = -99,44 ms. Total RTP
>> packets = 289   (expected 289)   Lost RTP packets = 0 (0,00%)
>> Sequence errors = 0  Duration 5,86 s (-372 ms clock drift,
>> corresponding to 7492 Hz (-6,35%)
>> [10:48] <oleg__> Jean, it may be concurrency bug
>> [10:49] <oleg__> very strange that OS will generate 15% loses on UDP,
>> right?
>> [10:50] <jeand> you never know with windoze
>> [10:50] <oleg__> both traces made by Vladmir, one is with old
>> scheduler and second with new
>> [10:50] <oleg__> so we have rights to compare
>> [10:51] <vralev> those are different trances, IMO not good to compare like
>> this
>> [10:51] <oleg__> why they are different?
>> [10:52] <oleg__> the same machine?
>> [10:52] <vralev> different scenarios
>> [10:52] <oleg__> but the quality of stream should irrespective of
>> scenarios
>> [10:53] <vralev> that's what we need to find out
>> [10:54] <oleg__> the current path more complex even
>> [10:55] <oleg__> and before I didn't see zeros in jitter and clock skew
>> [10:55] <oleg__> for all scenarios, even for simple one-way transmission
>> [10:56] <oleg__> I am done, any questions?
>> [10:57] <vralev> oleg__: we need to run real test against the whole MMS
>> [10:57] <vralev> to wrap the scheduler integration
>> [10:57] <oleg__> Vladmir, the test you ran is the whole MMS except
>> controller
>> [10:58] <oleg__> or testcase plays role of controller
>> [10:58] <vralev> oleg__: there is no codec either
>> [10:58] <oleg__> let's enable codecs
>> [10:58] <vralev> too simplified
>> [10:58] <vralev> well, once the controller is integrated we can run 309
>> [10:58] <oleg__> just look into path before say "too simplified" :)
>> [10:58] <vralev> that's the point
>> [10:59] <vralev> so that should be the next immedate step right?
>> [10:59] <oleg__> I want to find reason of failures on windows first
>> [10:59] <ivelin> guys, this arguing is not going anywhere....its tiring.
>> [11:00] <ivelin> Vladimir, please try your route independently,
>> whichever one works wins, and the other one just has to shut up
>> [11:00] <vralev> ivelin: my route is - using the old scheduler where
>> 309 works, it can't win
>> [11:00] <ivelin> I'm almost at a point of jumping into MMS code. Scary
>> thought, but can't wait forever on basic functionality to work. 309
>> should have passed 5 months ago!
>> [11:01] <vralev> ivelin: but et me give you a small update on 309 and
>> load testing (the old scheduler)
>> [11:01] <ivelin> why can't it win if 309 works?
>> [11:02] <vralev> ivelin: well, it's single-threaded and there are too
>> many limitations
>> [11:02] <vralev> for example the load tests I was supposed to do
>> [11:02] <vralev> I ran it and it couldnt sustain any load
>> [11:03] <vralev> after looking into the test more carefully it turns
>> out 70% of the runs the player doesn't play correct audio track
>> [11:03] <ivelin> why?
>> [11:04] <vralev> so this is reproducible even without load
>> [11:04] <vralev> there is a race condition for this particular
>> scenario with PLAYER-CONF-DETECTOR that is the reason
>> [11:05] <vralev> hoother topologies seem not to be affected
>> [11:05] <vralev> so this is reproducible but it's not very easy to fix
>> [11:06] <oleg__> I would say impossible without reworking scheduler
>> [11:06] <vralev> oleg__: yes i want to try with the new scheduler
>> [11:06] <vralev> for which we need the controller part
>> [11:06] <oleg__> Vladmir, do you think that MGCP can not be mapped to
>> 309 or it is a problem?
>> [11:06] <vralev> this will solve a good portion of the 309 mystery
>> [11:08] <vralev> oleg__: for the tests that I care right now MGCP can
>> be mapped to 309, I am not really concerned about this in any way
>> because we can always extend MGCP commands to fill whatever is missing
>> [11:08] <vralev> just need to have the controller so I can run the load
>> test
>> [11:08] <oleg__> ok, good
>> [11:09] <oleg__> so can we consider that the problem of scheduling in mms
>> exist?
>> [11:09] <oleg__> I mean 2.1B1 wich is used for 309 tests
>> [11:09] <ivelin> what exactly will solve the mystery?
>> [11:09] <vralev> oleg__: I don't know yet if scheduling will fix the load
>> tests
>> [11:10] <oleg__> if I am correct you have said that problem is
>> reproducable without load
>> [11:11] <vralev> ivelin: if the new scheduler solves the race
>> condition problem I see then this will unlock a big group of tests to
>> pass in batches
>> [11:11] <oleg__> Ivelin, mystery - unpredictable behaviour due to miss
>> scheduling
>> [11:11] <vralev> ivelin: as opposed to seeing them fail randomly
>> [11:12] <oleg__> Vladmir, what does mean "racing" in for single thread?
>> [11:13] <ivelin> when would we know if the new scheduler provides
>> consistency?
>> [11:13] <oleg__> Ivelin, we know it now. See stats above
>> [11:14] <vralev> oleg__: I am not sure it is scheduling problem - it
>> is still racing becausce you basically implemeneted multithreading
>> inside single thread like a green-thread
>> [11:14] <oleg__> how is it?
>> [11:15] <vralev> oleg__: I don't know how it happens, it difficult to
>> trace precisly, I will get back to you on that
>> [11:15] == abhayani [~abhayani@115.252.103.219] has joined #mobicents
>> [11:16] <vralev> oleg__: if the new scheduler fixes it - then great though
>> [11:17] <ivelin> let's move on with the agenda.
>> [11:17] <ivelin> #3 MSS 1.6, 2.0
>> [11:20] <jeand> ok
>> [11:20] <jeand> regarding MSS 2.0
>> [11:20] <jeand> #1 MSS in the cloud - Built RPM of MSS and SIP LB and
>> EC2 Images and got Amazon EC2 Access. Target for next week is running
>> the cluster built by Boxgrinder and cantierre on Amazon. Started talks
>> about the management
>> [11:20] <jeand> #2 MSS 2.X and CTF - CTF is code complete for the
>> first release, targetting 1.0 release and MSS 2.0 release for 2nd week
>> of April
>> [11:20] <jeand> regarding MSS 1.6
>> [11:21] <jeand> made good progress and closing the gap on the number
>> of issues reported but there is still a bunch of issues to fix (around
>> 10)
>> [11:21] <jeand> http://www.mobicents.org/mss-roadmap.html
>> [11:21] <jeand> before able to move on to completing the features
>> (mainly SNMPv3 support)
>> [11:22] <jeand> the issues left are not simple ones (related to B2BUA
>> forking and a security stress test)
>> [11:23] <jeand> we also fixed a bug on JBCP 1.2.X
>> [11:23] <jeand> that's all from my side
>> [11:25] <ivelin> nice. Did Eucalyptus deployment work?
>> [11:25] <ivelin> supposedly its AWS compatible
>> [11:25] <jeand> not yet
>> [11:26] <jeand> you have to transform the AMI into EMI
>> [11:26] <jeand> and it's a manual step
>> [11:26] <jeand> so the first target is AWS now
>> [11:26] <jeand> since we have to go throuhg it anyway  to do
>> Eucalyptus deployments
>> [11:26] <jeand> I saw some commits today from marek
>> [11:26] <jeand> regarding eucalyptus
>> [11:27] <jeand> maybe they fixed stuff so that we can directly create an
>> EMI
>> [11:27] == slegrik [~pslegr@186.136.broadband9.iol.cz] has quit [Ping
>> timeout: 276 seconds]
>> [11:31] <ivelin> its great that Marek is helping. He's great.
>> [11:31] <ivelin> #4 Diameter 1.4
>> [11:31] <jeand> right indeed
>> [11:31] == slegrik [~pslegr@186.136.broadband9.iol.cz] has joined
>> #mobicents
>> [11:32] <@alexandrem> We've released yesterday 1.4.0.BETA1
>> [11:32] <@alexandrem> with better cluster functionality, with
>> fine-grained data replication, more performant, stable and compliant
>> [11:33] <@alexandrem> next goals are: setting up HA/FT scenarios with
>> SLEE/MSS, setting up load tests at hudson, continue the blog series
>> [11:34] <@alexandrem> and finish the TLS support work
>> [11:36] <ivelin> there are currently no HA scenarios tested on Hudson?
>> [11:36] <@baranowb> only in unit tests
>> [11:36] <@baranowb> and only diameter
>> [11:37] <@alexandrem> there's at least one HA scenario for each
>> application (which has stateful mode)
>> [11:39] <jeand> HA tests are for standalone mode right ?
>> [11:39] <@baranowb> y, did not move further with MSS example
>> [11:39] <@baranowb> by standalone you mean only diameter ?
>> [11:40] <ivelin> what is the plan for implementing distributed ha
>> tests like the  ones for MSS?
>> [11:40] <jeand> I mean no SLEE or MSS involved
>> [11:41] <jeand> ivelin, if we could move that to junit tests that would be
>> best
>> [11:41] <@alexandrem> jeand: yes, standalone.. just the Diameter
>> client and server in standalone
>> [11:41] <jeand> maintenance of this thing is a pain and there is often
>> race conditions
>> [11:44] <@alexandrem> ivelin: first we will analyze common message
>> flows and detect possible point of failure and create such tests
>> [11:45] <ivelin> ok, that would be great. There is more new leads on
>> Diameter. It would be good marketing to blog about the HA tests.
>> [11:46] <ivelin> #5 JSLEE 2.4
>> [11:46] <@baranowb> ok, Ed seems to be still gone
>> [11:46] <@baranowb> so 2.4.0.CR1 "SAKURA" has been released
>> [11:47] <@baranowb> luckily Eduardo managed to solve all problems with
>> JSIP stack(thanks to Jean I think, not sure)
>> [11:47] <@baranowb> and JSLEE support early dialog failover
>> [11:47] <@baranowb> also since new cluster is used
>> [11:48] <@baranowb> further goodies in this release are:
>> [11:48] <@baranowb>  - new cluster, tha
>> [11:48] <@baranowb> - support of SLEE node start in live cluster(this
>> includes changes in RA code to conform to specs a bit more)
>> [11:48] <@baranowb> - minor upgrades in RA deps, like XCAP/HTTP
>> [11:48] <@alexandrem> - EclipSLEE support for RA Type and RA creation
>> [11:49] <@baranowb> - new subscribe features in SIP subscribe enabler
>> and XDM enabler
>> [11:49] <@baranowb>  - minor fixes in SLEE tools + big update in eclipSLEE
>> [11:49] <@baranowb> in 2 days
>> [11:50] <@baranowb> it has ~60 downloads I think
>> [11:50] <@baranowb> maybe more now
>> [11:50] <@baranowb> devs for next release on my side are concentrated
>> around enablers and their features
>> [11:51] <@baranowb> not sure what Eduardo was doing, but I think he
>> was working on presence, to port it to 2.4
>> [11:51] <@baranowb> alex anything to add  ?
>> [11:51] <@alexandrem> guess that's it
>> [11:52] <ivelin> yes, I forgot Eduardo is meeting a prospect today
>> [11:52] == jeffprestes [~jeffprest@nat/redhat/x-eazlkqgayushmvqy] has
>> joined #mobicents
>> [11:52] <ivelin> although I thought they would be done by now
>> [11:52] <ivelin> #7 SS7
>> [11:53] <abhayani> k. All the bugs for 8.1 are fixed.
>> [11:53] <@baranowb> #6 quick update - Ed is porting to new JSLEE I
>> think, Ive been digging into xcap diff
>> [11:53] <@baranowb> #7
>> [11:54] <abhayani> we have worked with dialogic native code
>> [11:54] <abhayani> and successfully simulated the test environment for
>> USSD messages
>> [11:55] <abhayani> last step of testing before till MAP stack + MAP RA
>> before releasing 8.1
>> [11:55] <@baranowb> rephrase
>> [11:56] <abhayani> last step of testing MAP stack + MAP RA before
>> releasing 8.1
>> [11:56] <abhayani> docs update too remaining
>> [11:58] <ivelin> 8.1?
>> [11:58] <ivelin> when did Mobicents SS7 reach v8.1?
>> [11:58] <abhayani> release 8.1
>> [11:58] <@baranowb> B8
>> [11:59] <slegrik> shouldn't we go with B9 instead of 8.1 ?
>> [11:59] <slegrik> sounds confusing to me
>> [11:59] <abhayani> nope, last we had B7 release
>> [11:59] <abhayani> and we decided B8 to divide into 2
>> [11:59] <abhayani> 8.1 and 8.2
>> [12:00] <abhayani> this was to expedite release 8.1 which has SCTP so
>> community can start using it without SS7 hardware
>> [12:00] <abhayani> 8.2 is about new messages for MAP + some CLI fixes
>> [12:00] <abhayani> + ISUP
>> [12:01] <ivelin> B9
>> [12:02] <slegrik> so it is gonna be SS7-1.0.0.Beta8.1 and
>> SS7-1.0.0.Beta8.2 ?
>> [12:02] <ivelin> no need to split betas because they are not official
>> releases
>> [12:02] <abhayani> yes Pavel.
>> [12:02] <slegrik> have never seen that
>> [12:02] <ivelin> just make it B8 and B9
>> [12:02] <abhayani> ivelin, we will have long B's ;)
>> [12:02] <abhayani> so thought if breaking into 2 rather than using new
>> [12:04] <ivelin> no biggie
>> [12:04] <jeand> I have to go
>> [12:04] <abhayani> k in that case will release it as B8
>> [12:05] <ivelin> thanks, Jean
>> [12:05] <ivelin> #8 QE
>> [12:06] <ivelin> Luis?
>> [12:07] <barreiro> Besides JBCP 5.1 testing already mentioned in #1 I
>> have worked with alex on setting up the build of Eclipslee on hudson
>> [12:08] <ivelin> how did the perf tests do this weekend?
>> [12:10] <barreiro> perf tests ran fine.
>> [12:10] <barreiro> They seem to be back on track :)
>> [12:11] <ivelin> nice. Regressions detected?
>> [12:12] <barreiro> there were a couple of failures due to some goals
>> that were adusted ... will fix them for next run
>> [12:12] <barreiro> There was no regression.
>> [12:13] <ivelin> the new goals were too tight?
>> [12:14] <barreiro> yes. Some parameters change more from test to test
>> than others.
>> [12:16] <ivelin> interesting. The goal is to find parameters that
>> don't need to change much so we can avoid false test failures
>> [12:18] <ivelin> ok, are we done with the agenda?
>> [12:21] <ivelin> thanks everyone for joining
>
>