mobicents-JAINSLEE: [mobicents-public] Mobicents weekly meeting notes, March 30, 2011

Attendees:
Alex, Bartek, Eduardo, Jean, Luis, Oleg, Pavel, Shay, Vladimir, Ivelin

Summary:
#1 JBCP 1.2.11 code freeze in 2 weeks; 5.1.CR2 in QE
#2 MMS 2.x, JSR 309, new scheduler integration, load tests: new
scheduler solves some of the blocking problems, new RTP streaming
problem
#3 MSS 1.6 more bug fixes; 2.0 progress with cloud packaging and CTF
#4 Diameter 1.4: Beta released, HA tests and TLS next
#5 JSLEE 2.4.CR1 released; big update in EclipSLEE
#6 Presence/RCS: porting to latest JSLEE
#7 SS7: working on Dialogic integration, finalizing MAP for B8
release, tested USSD simulation;
#8 QE: JBCP 5.1 testing; hudson scripts for EclipSLEE; tuning perf
regression parameters

Log:
-----------
[10:29] <ivelin> lol, iPhone still has daylight time change bugs
[10:29] <ivelin> its such a mess
[10:30] <ivelin> there are even half time zones in the US and I think India
[10:30] <ivelin> Pavel, take the stage
[10:30] <slegrik> ok
[10:30] <slegrik> #1
[10:30] <slegrik> JBCP-5.1.0.GA - CR02 is in QA, there was foud a
wrong config file issue, so this has been resolved immediatly
[10:31] <slegrik> Luis might update here, how testing goes and when he
expects sign off
[10:31] <slegrik> JBCP-1.2.11 - There is planed codefreeze in 2 weeks,
so still in fixing issues phase
[10:31] <slegrik> Besides I have updated JBCP release schedule, will
post on jbcp-dev till end of this week. Have prepared preso and demo
for
[10:31] <slegrik> Open House in Brno. Went over the EAP & SOA
Productization teams meeting minutes and conclusions from Westford
productization gathering
[10:32] <slegrik> barreiro, want to update on JBCP-5.1.0.CR02 QA status ?
[10:32] <barreiro> sure.
[10:32] <barreiro> I put up a test plan at
https://docspace.corp.redhat.com/docs/DOC-62021
[10:33] <barreiro> looks a bit empty, but I plan to have everything
ready during the beggining of next week.
[10:36] <slegrik> that's all from me, in cease of no questions
[10:37] <ivelin> ok, #2
[10:37] <ivelin> #2 MMS 2.x, JSR 309, new scheduler integration, load tests
[10:38] <oleg__> I have two news
[10:38] <oleg__> one bad and one good
[10:39] <oleg__> the bad is that I hang on RTP connection tests for
the whole week and I don't have a complete solution yet
[10:39] <oleg__> probably that the problem is OS or JVM because the
same tests on Vladimir's machine passed always
[10:40] <jeand> RTP connection tests of JSR 309 ?
[10:40] <oleg__> however I need to double check everything that it is
not the bug inside code
[10:40] <oleg__> Jean, no. for MMS core
[10:41] <jeand> what's your OS ?
[10:41] <oleg__> what is good... Vladimir created wireshark trace for
RTP exhange and it gives "external" view on scheduler
[10:42] <oleg__> the result is as follows:
[10:42] <oleg__> Max delta = 0,00 ms at packet no. 0 Max jitter =
0,00 ms. Mean jitter = 0,00 ms. Max skew = 0,00 ms. Total RTP packets
= 255 (expected 255) Lost RTP packets = 0 (0,00%) Sequence
errors = 0 Duration 5,09 s (0 ms clock drift, corresponding to 1 Hz
(+0,00%)
[10:42] <oleg__> Jean, windows
[10:42] <oleg__> so it is so perfect so I still don'e belive into it
[10:43] <vralev> i dont understand how i could be so wrong on your side
[10:43] <vralev> it
[10:43] <oleg__> Vladimir, on my side the failre rate > 50%
[10:44] <vralev> well, it is just too strange
[10:44] <oleg__> I can't create trace on windows using local interface
so I am monitoring number of packets in check points and spectra of
signal
[10:44] <vralev> don't use windows
[10:45] <jeand> is this wireshark on localhost vralev ?
[10:45] <vralev> jeand: yes
[10:45] <oleg__> what is more strange that I am detecting (if it
happens) loses only on rx leg on second connection always
[10:45] <ivelin> yes, Oleg, try Linux. We are spending so much time on
problems that don't show up on Vladimir's machine.
[10:45] <jeand> might be worth a try on a real network ?
[10:46] <oleg__> Ivelin, it may indicator of problem
[10:46] <vralev> jeand: not worth it right now
[10:46] <oleg__> I would preffer to confirm that this is OS specific
problem and have explanation for it
[10:47] <oleg__> anyway most important thing that real time scheduling works!
[10:47] <jeand> oleg__, if you try linux and still see the problem
[10:47] <oleg__> just to comapre with previous streams:
[10:47] <jeand> you will know if it's OS specific
[10:47] <jeand> or not
[10:48] <oleg__> Max delta = 21,99 ms at packet no. 1293 Max jitter =
1,08 ms. Mean jitter = 0,76 ms. Max skew = -99,44 ms. Total RTP
packets = 289 (expected 289) Lost RTP packets = 0 (0,00%)
Sequence errors = 0 Duration 5,86 s (-372 ms clock drift,
corresponding to 7492 Hz (-6,35%)
[10:48] <oleg__> Jean, it may be concurrency bug
[10:49] <oleg__> very strange that OS will generate 15% loses on UDP, right?
[10:50] <jeand> you never know with windoze
[10:50] <oleg__> both traces made by Vladmir, one is with old
scheduler and second with new
[10:50] <oleg__> so we have rights to compare
[10:51] <vralev> those are different trances, IMO not good to compare like this
[10:51] <oleg__> why they are different?
[10:52] <oleg__> the same machine?
[10:52] <vralev> different scenarios
[10:52] <oleg__> but the quality of stream should irrespective of scenarios
[10:53] <vralev> that's what we need to find out
[10:54] <oleg__> the current path more complex even
[10:55] <oleg__> and before I didn't see zeros in jitter and clock skew
[10:55] <oleg__> for all scenarios, even for simple one-way transmission
[10:56] <oleg__> I am done, any questions?
[10:57] <vralev> oleg__: we need to run real test against the whole MMS
[10:57] <vralev> to wrap the scheduler integration
[10:57] <oleg__> Vladmir, the test you ran is the whole MMS except controller
[10:58] <oleg__> or testcase plays role of controller
[10:58] <vralev> oleg__: there is no codec either
[10:58] <oleg__> let's enable codecs
[10:58] <vralev> too simplified
[10:58] <vralev> well, once the controller is integrated we can run 309
[10:58] <oleg__> just look into path before say "too simplified" :)
[10:58] <vralev> that's the point
[10:59] <vralev> so that should be the next immedate step right?
[10:59] <oleg__> I want to find reason of failures on windows first
[10:59] <ivelin> guys, this arguing is not going anywhere....its tiring.
[11:00] <ivelin> Vladimir, please try your route independently,
whichever one works wins, and the other one just has to shut up
[11:00] <vralev> ivelin: my route is - using the old scheduler where
309 works, it can't win
[11:00] <ivelin> I'm almost at a point of jumping into MMS code. Scary
thought, but can't wait forever on basic functionality to work. 309
should have passed 5 months ago!
[11:01] <vralev> ivelin: but et me give you a small update on 309 and
load testing (the old scheduler)
[11:01] <ivelin> why can't it win if 309 works?
[11:02] <vralev> ivelin: well, it's single-threaded and there are too
many limitations
[11:02] <vralev> for example the load tests I was supposed to do
[11:02] <vralev> I ran it and it couldnt sustain any load
[11:03] <vralev> after looking into the test more carefully it turns
out 70% of the runs the player doesn't play correct audio track
[11:03] <ivelin> why?
[11:04] <vralev> so this is reproducible even without load
[11:04] <vralev> there is a race condition for this particular
scenario with PLAYER-CONF-DETECTOR that is the reason
[11:05] <vralev> hoother topologies seem not to be affected
[11:05] <vralev> so this is reproducible but it's not very easy to fix
[11:06] <oleg__> I would say impossible without reworking scheduler
[11:06] <vralev> oleg__: yes i want to try with the new scheduler
[11:06] <vralev> for which we need the controller part
[11:06] <oleg__> Vladmir, do you think that MGCP can not be mapped to
309 or it is a problem?
[11:06] <vralev> this will solve a good portion of the 309 mystery
[11:08] <vralev> oleg__: for the tests that I care right now MGCP can
be mapped to 309, I am not really concerned about this in any way
because we can always extend MGCP commands to fill whatever is missing
[11:08] <vralev> just need to have the controller so I can run the load test
[11:08] <oleg__> ok, good
[11:09] <oleg__> so can we consider that the problem of scheduling in mms exist?
[11:09] <oleg__> I mean 2.1B1 wich is used for 309 tests
[11:09] <ivelin> what exactly will solve the mystery?
[11:09] <vralev> oleg__: I don't know yet if scheduling will fix the load tests
[11:10] <oleg__> if I am correct you have said that problem is
reproducable without load
[11:11] <vralev> ivelin: if the new scheduler solves the race
condition problem I see then this will unlock a big group of tests to
pass in batches
[11:11] <oleg__> Ivelin, mystery - unpredictable behaviour due to miss
scheduling
[11:11] <vralev> ivelin: as opposed to seeing them fail randomly
[11:12] <oleg__> Vladmir, what does mean "racing" in for single thread?
[11:13] <ivelin> when would we know if the new scheduler provides consistency?
[11:13] <oleg__> Ivelin, we know it now. See stats above
[11:14] <vralev> oleg__: I am not sure it is scheduling problem - it
is still racing becausce you basically implemeneted multithreading
inside single thread like a green-thread
[11:14] <oleg__> how is it?
[11:15] <vralev> oleg__: I don't know how it happens, it difficult to
trace precisly, I will get back to you on that
[11:15] == abhayani [~abhayani@115.252.103.219] has joined #mobicents
[11:16] <vralev> oleg__: if the new scheduler fixes it - then great though
[11:17] <ivelin> let's move on with the agenda.
[11:17] <ivelin> #3 MSS 1.6, 2.0
[11:20] <jeand> ok
[11:20] <jeand> regarding MSS 2.0
[11:20] <jeand> #1 MSS in the cloud - Built RPM of MSS and SIP LB and
EC2 Images and got Amazon EC2 Access. Target for next week is running
the cluster built by Boxgrinder and cantierre on Amazon. Started talks
about the management
[11:20] <jeand> #2 MSS 2.X and CTF - CTF is code complete for the
first release, targetting 1.0 release and MSS 2.0 release for 2nd week
of April
[11:20] <jeand> regarding MSS 1.6
[11:21] <jeand> made good progress and closing the gap on the number
of issues reported but there is still a bunch of issues to fix (around
10)
[11:21] <jeand> http://www.mobicents.org/mss-roadmap.html
[11:21] <jeand> before able to move on to completing the features
(mainly SNMPv3 support)
[11:22] <jeand> the issues left are not simple ones (related to B2BUA
forking and a security stress test)
[11:23] <jeand> we also fixed a bug on JBCP 1.2.X
[11:23] <jeand> that's all from my side
[11:25] <ivelin> nice. Did Eucalyptus deployment work?
[11:25] <ivelin> supposedly its AWS compatible
[11:25] <jeand> not yet
[11:26] <jeand> you have to transform the AMI into EMI
[11:26] <jeand> and it's a manual step
[11:26] <jeand> so the first target is AWS now
[11:26] <jeand> since we have to go throuhg it anyway to do
Eucalyptus deployments
[11:26] <jeand> I saw some commits today from marek
[11:26] <jeand> regarding eucalyptus
[11:27] <jeand> maybe they fixed stuff so that we can directly create an EMI
[11:27] == slegrik [~pslegr@186.136.broadband9.iol.cz] has quit [Ping
timeout: 276 seconds]
[11:31] <ivelin> its great that Marek is helping. He's great.
[11:31] <ivelin> #4 Diameter 1.4
[11:31] <jeand> right indeed
[11:31] == slegrik [~pslegr@186.136.broadband9.iol.cz] has joined #mobicents
[11:32] <@alexandrem> We've released yesterday 1.4.0.BETA1
[11:32] <@alexandrem> with better cluster functionality, with
fine-grained data replication, more performant, stable and compliant
[11:33] <@alexandrem> next goals are: setting up HA/FT scenarios with
SLEE/MSS, setting up load tests at hudson, continue the blog series
[11:34] <@alexandrem> and finish the TLS support work
[11:36] <ivelin> there are currently no HA scenarios tested on Hudson?
[11:36] <@baranowb> only in unit tests
[11:36] <@baranowb> and only diameter
[11:37] <@alexandrem> there's at least one HA scenario for each
application (which has stateful mode)
[11:39] <jeand> HA tests are for standalone mode right ?
[11:39] <@baranowb> y, did not move further with MSS example
[11:39] <@baranowb> by standalone you mean only diameter ?
[11:40] <ivelin> what is the plan for implementing distributed ha
tests like the ones for MSS?
[11:40] <jeand> I mean no SLEE or MSS involved
[11:41] <jeand> ivelin, if we could move that to junit tests that would be best
[11:41] <@alexandrem> jeand: yes, standalone.. just the Diameter
client and server in standalone
[11:41] <jeand> maintenance of this thing is a pain and there is often
race conditions
[11:44] <@alexandrem> ivelin: first we will analyze common message
flows and detect possible point of failure and create such tests
[11:45] <ivelin> ok, that would be great. There is more new leads on
Diameter. It would be good marketing to blog about the HA tests.
[11:46] <ivelin> #5 JSLEE 2.4
[11:46] <@baranowb> ok, Ed seems to be still gone
[11:46] <@baranowb> so 2.4.0.CR1 "SAKURA" has been released
[11:47] <@baranowb> luckily Eduardo managed to solve all problems with
JSIP stack(thanks to Jean I think, not sure)
[11:47] <@baranowb> and JSLEE support early dialog failover
[11:47] <@baranowb> also since new cluster is used
[11:48] <@baranowb> further goodies in this release are:
[11:48] <@baranowb> - new cluster, tha
[11:48] <@baranowb> - support of SLEE node start in live cluster(this
includes changes in RA code to conform to specs a bit more)
[11:48] <@baranowb> - minor upgrades in RA deps, like XCAP/HTTP
[11:48] <@alexandrem> - EclipSLEE support for RA Type and RA creation
[11:49] <@baranowb> - new subscribe features in SIP subscribe enabler
and XDM enabler
[11:49] <@baranowb> - minor fixes in SLEE tools + big update in eclipSLEE
[11:49] <@baranowb> in 2 days
[11:50] <@baranowb> it has ~60 downloads I think
[11:50] <@baranowb> maybe more now
[11:50] <@baranowb> devs for next release on my side are concentrated
around enablers and their features
[11:51] <@baranowb> not sure what Eduardo was doing, but I think he
was working on presence, to port it to 2.4
[11:51] <@baranowb> alex anything to add ?
[11:51] <@alexandrem> guess that's it
[11:52] <ivelin> yes, I forgot Eduardo is meeting a prospect today
[11:52] == jeffprestes [~jeffprest@nat/redhat/x-eazlkqgayushmvqy] has
joined #mobicents
[11:52] <ivelin> although I thought they would be done by now
[11:52] <ivelin> #7 SS7
[11:53] <abhayani> k. All the bugs for 8.1 are fixed.
[11:53] <@baranowb> #6 quick update - Ed is porting to new JSLEE I
think, Ive been digging into xcap diff
[11:53] <@baranowb> #7
[11:54] <abhayani> we have worked with dialogic native code
[11:54] <abhayani> and successfully simulated the test environment for
USSD messages
[11:55] <abhayani> last step of testing before till MAP stack + MAP RA
before releasing 8.1
[11:55] <@baranowb> rephrase
[11:56] <abhayani> last step of testing MAP stack + MAP RA before releasing 8.1
[11:56] <abhayani> docs update too remaining
[11:58] <ivelin> 8.1?
[11:58] <ivelin> when did Mobicents SS7 reach v8.1?
[11:58] <abhayani> release 8.1
[11:58] <@baranowb> B8
[11:59] <slegrik> shouldn't we go with B9 instead of 8.1 ?
[11:59] <slegrik> sounds confusing to me
[11:59] <abhayani> nope, last we had B7 release
[11:59] <abhayani> and we decided B8 to divide into 2
[11:59] <abhayani> 8.1 and 8.2
[12:00] <abhayani> this was to expedite release 8.1 which has SCTP so
community can start using it without SS7 hardware
[12:00] <abhayani> 8.2 is about new messages for MAP + some CLI fixes
[12:00] <abhayani> + ISUP
[12:01] <ivelin> B9
[12:02] <slegrik> so it is gonna be SS7-1.0.0.Beta8.1 and SS7-1.0.0.Beta8.2 ?
[12:02] <ivelin> no need to split betas because they are not official releases
[12:02] <abhayani> yes Pavel.
[12:02] <slegrik> have never seen that
[12:02] <ivelin> just make it B8 and B9
[12:02] <abhayani> ivelin, we will have long B's ;)
[12:02] <abhayani> so thought if breaking into 2 rather than using new
[12:04] <ivelin> no biggie
[12:04] <jeand> I have to go
[12:04] <abhayani> k in that case will release it as B8
[12:05] <ivelin> thanks, Jean
[12:05] <ivelin> #8 QE
[12:06] <ivelin> Luis?
[12:07] <barreiro> Besides JBCP 5.1 testing already mentioned in #1 I
have worked with alex on setting up the build of Eclipslee on hudson
[12:08] <ivelin> how did the perf tests do this weekend?
[12:10] <barreiro> perf tests ran fine.
[12:10] <barreiro> They seem to be back on track :)
[12:11] <ivelin> nice. Regressions detected?
[12:12] <barreiro> there were a couple of failures due to some goals
that were adusted ... will fix them for next run
[12:12] <barreiro> There was no regression.
[12:13] <ivelin> the new goals were too tight?
[12:14] <barreiro> yes. Some parameters change more from test to test
than others.
[12:16] <ivelin> interesting. The goal is to find parameters that
don't need to change much so we can avoid false test failures
[12:18] <ivelin> ok, are we done with the agenda?
[12:21] <ivelin> thanks everyone for joining

mobicents-JAINSLEE

Search This Blog

[mobicents-public] Mobicents weekly meeting notes, March 30, 2011

Total Pageviews