Now it's confirmed with certainty that it's the datasource causing
this problem. We've limited it to the load-balancing in the datasource
configuration (The mysql connector can do load balancing when multiple
comma separated IP's are added to the connect string). If we use only
one of the cluster IP's in the configuration, the transactions work.
We are currently checking if there is some special transaction
handling being used when the driver does load balancing. I'll post the
solution here if/when we find it.
I guess your money is still on some other component trying to auto-
commit the global transaction? Or can you think of other reasons why
the transaction would be rolled back?
BR,
-Calle
On Apr 26, 5:50 pm, Eduardo Martins <emmart...@gmail.com> wrote:
> If by changing the datasource everything works fine it may be some
> configuration, perhaps it is trying to auto commit the tx? Also the
> fact you see rollback exceptions will for sure have an impact on SLEE,
> cause Arjuna is the underlying tx manager, and that may explain why
> the event is not fired (its tx gets a rollback). I will try to look
> at the log later today and let you known if I find anything unusual,
> but no promises. Meanwhile I recommend you to start digging the
> configuration of such datasource, and see if there is anything which
> clearly may be the cause.
>
> -- Eduardo
> ..............................................http://emmartins.blogspot.comhttp://redhat.com/solutions/telco
>
> 2011/4/26 Carl-Magnus Björkell <nrgizer...@gmail.com>:
>
>
>
>
>
>
>
> > Oh, and I forgot, the content of the databases are exactly the same
> > (cloned it just before re-running the test scenario, so that is not a
> > variable in this case)
>
> > -Calle
>
> > On Apr 26, 5:09 pm, Carl-Magnus Björkell <nrgizer...@gmail.com> wrote:
> >> Hey Eduardo,
>
> >> Thanks for the category list. I was trying to get more logging, but I
> >> wasn't really sure what categories to turn on, so I was flooded with
> >> too much stuff to make any sense of.
>
> >> Disclaimer: Hope this mess makes sense, I've kept the post open for
> >> half the day trying to write down what I find. Please bare with me.
>
> >> I tried to run the DEBUG traces, but didn't spot anything that could
> >> explain what happens, so I tried with the TRACE level. Since this
> >> trace is ginormous, I'm going to start with a snippet that I think
> >> might be the crucial parts, if you think that this doesn't contain the
> >> issue I'll try to get the full trace uploaded somewhere (pastebin/
> >> kwrite didn't want to accept/complete the full >2MiB paste :P )
>
> >>http://pastebin.com/MNXFDDxk(theIPs and usernames have been
> >> cencored, I hope :) )
>
> >> The line that caught my eye was: 10:21:12,443 TRACE [TxInterceptor]
> >> Setting tx as null and gtx as null ... this doesn't seem to be
> >> right..?
>
> >> Here is the same scenario again with the DEBUG level:http://pastebin.com/d8ACq1KY
>
> >> Just to give some context, what is happening here is that the call
> >> layer (which in this case is a single root Sbb) is firing our own
> >> custom CallStatusChangeEvent on a NullActivity ACI. The receiver is
> >> the AnnouncementSbb, but it never seems to actually get the event
> >> (Rollback happens before it gets there..?)
>
> >> It is worth noting that this is not thrown when executing the code
> >> snippet I have above. To be honest I'm not sure if I this is because I
> >> misread the trace the first time around, if the code has changed since
> >> then, or if this is a new issue with the same exception. In any case
> >> it's still a problem. In this case I'm much more confused though since
> >> I can't find any log message that would indicate that the
> >> CallStatusChangeEvent (our custom event) is ever even received in the
> >> attached Sbb. I've been staring at this log now for about an hour, and
> >> I can't seem to find any reason for why our code would be causing
> >> this...
>
> >> NOTE! We're using a load balanced JDBC (MySQL cluster) local-tx-
> >> datasource in our Sbb's. Database related code is not touched at all
> >> in relation to most of the arjuna RollbackExceptions that we see (yes,
> >> we've seen these before). However, the interesting thing is that the
> >> RollbackExceptions disappear completely if we change the datasource
> >> (by editing mysql-ds.xml) so that we use a single InnoDB standard
> >> mysql server instead. I don't have an explanation for this either. I
> >> just confirmed this by shutting down mobicents, editing *only* mysql-
> >> ds.xml and then restarting and running the scenario again. It worked
> >> without problems (announcement was played an call gracefully dropped).
>
> >> Working mysql-ds.xml entry:
> >> <local-tx-datasource>
> >> <jndi-name>DB</jndi-name>
> >> <use-java-context>false</use-java-context>
> >> <connection-url>jdbc:mysql://myownserver:3306/DB</connection-url>
> >> <driver-class>com.mysql.jdbc.Driver</driver-class>
> >> <user-name>ohmilols</user-name>
> >> <password>***</password>
> >> <exception-sorter-class-
> >> name>org.jboss.resource.adapter.jdbc.vendor.MySQLExceptionSorter</
> >> exception-sorter-class-name>
> >> <valid-connection-checker-class-
> >> name>org.jboss.resource.adapter.jdbc.vendor.MySQLValidConnectionChecker</
> >> valid-connection-checker-class-name>
> >> <metadata>
> >> <type-mapping>mySQL</type-mapping>
> >> </metadata>
> >> </local-tx-datasource>
>
> >> Non-working mysql-ds.xml entry:
> >> <local-tx-datasource>
> >> <jndi-name>DB</jndi-name>
> >> <use-java-context>false</use-java-context>
> >> <connection-url>jdbc:mysql://clusternode1:3306,clusternode2:3306/DB?
> >> useConfigs=clusterBase&secondsBeforeRetryMaster=60</connection-
> >> url>
> >> <driver-class>com.mysql.jdbc.Driver</driver-class>
> >> <user-name>ohmilols</user-name>
> >> <password>***</password>
> >> <exception-sorter-class-
> >> name>org.jboss.resource.adapter.jdbc.vendor.MySQLExceptionSorter</
> >> exception-sorter-class-name>
> >> <valid-connection-checker-class-
> >> name>org.jboss.resource.adapter.jdbc.vendor.MySQLValidConnectionChecker</
> >> valid-connection-checker-class-name>
> >> <metadata>
> >> <type-mapping>mySQL</type-mapping>
> >> </metadata>
> >> </local-tx-datasource>
>
> >> Any ideas? Any at all?
>
> >> BR,
> >> -Calle
>
> >> On Apr 25, 6:29 pm, Eduardo Martins <emmart...@gmail.com> wrote:
>
> >> > There may be lots of reasons for a tx commit failure, such as
> >> > concurrent txs setting same state with different values, or even a
> >> > hidden exception that results in some code raising the tx rollback
> >> > flag, in your case will need to adjust the logging to understand what
> >> > is going on, first lets try setting everything with debug level so
> >> > please set the log4j config file to:
>
> >> > <?xml version="1.0" encoding="UTF-8"?>
> >> > <!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">
>
> >> > <log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/"
> >> > debug="false">
>
> >> > <appender name="CONSOLE" class="org.apache.log4j.ConsoleAppender">
> >> > <errorHandler class="org.jboss.logging.util.OnlyOnceErrorHandler"/>
> >> > <param name="Target" value="System.out"/>
> >> > <param name="Threshold" value="DEBUG"/>
> >> > <layout class="org.apache.log4j.PatternLayout">
> >> > <param name="ConversionPattern" value="%d{ABSOLUTE} %-5p [%c{1}] %m%n"/>
> >> > </layout>
> >> > </appender>
>
> >> > <category name="org.jboss.cache">
> >> > <priority value="DEBUG" />
> >> > </category>
>
> >> > <category name="javax.slee">
> >> > <priority value="DEBUG" />
> >> > </category>
>
> >> > <category name="org.mobicents">
> >> > <priority value="DEBUG" />
> >> > </category>
>
> >> > <root>
> >> > <level value="INFO"/>
> >> > <appender-ref ref="CONSOLE"/>
> >> > </root>
>
> >> > </log4j:configuration>
>
> >> > Please paste the console log since the SIP message arrives in SIP
> >> > stack, if nothing becomes clear we will go for the ultra packed TRACE
> >> > log for a few components.
>
> >> > -- Eduardo
> >> > ..............................................http://emmartins.blogspot.comhttp://redhat.com/solutions/telco
>
> >> > 2011/4/19 Carl-Magnus Björkell <nrgizer...@gmail.com>:
>
> >> > > Hey,
>
> >> > > That is the full stack trace. The server log file wouldn't do you any
> >> > > good since the only thing before this error is the reception of the
> >> > > INVITE in the SIP RA and then the logging that our internal event has
> >> > > been received in our custom Sbb.
>
> >> > > I'm sure we can bubblegum-fix this specific problem in one way or
> >> > > another, I'm more interested in the general "rules" for transactions
> >> > > and what might be causing errors of this sort. Hopefully someone has
> >> > > some insight into this, and is willing to share :)
>
> >> > > BR,
> >> > > -Calle
>
> >> > > On Apr 19, 7:21 pm, aayush <abhatnagar192...@gmail.com> wrote:
> >> > >> Carl..can you provide the full stack trace? Or better still - you can attach
> >> > >> the server.log file.
>
> >> > >> On Apr 19, 2011 9:05 PM, "Carl-Magnus Björkell" <nrgizer...@gmail.com>
> >> > >> wrote:
>
> >> > >> Well, spam is taking it a bit far I think :) I'm sorry if I came
> >> > >> across as rude, that was not my intention. You just misunderstood what
> >> > >> transactions I was talking about. Do give the SIP RFC's and JAIN SIP
> >> > >> specification a read, it should clarify a lot about what's going on
> >> > >> behind the scenes in the SIP RA.
>
> >> > >> Now, to get back on topic: Does anyone have an idea about what's going
> >> > >> on in the OP?
>
> >> > >> BR,
> >> > >> -Calle
>
> >> > >> On Apr 19, 6:05 pm, Teresa Reis <zizata...@gmail.com> wrote:
>
> >> > >> > Sorry for the trouble that I cause, ...
> >> > >> > 2011/4/19 Carl-Magnus Björkell <nrgizer...@gmail.com>
>
> >> > >> > > Please don't hijack my thread. This is not about SIP transactions, but
> >> > >> > > the SLE...