WAP problem

conseo 4consensus at web.de
Tue Sep 18 23:06:47 EDT 2012


Hey Mike,

> 
> > ... The problem is using PreparedStatementCache as the ResultSet
> > depends on the statement. (1) Both queries (and clients) work when I
> > create a new PreparedStatement for each query. This cannot be solved
> > by synchronization if the ResultSet object leaves it, which it does
> > here. ...
> 
> If you want, you can mark your HarvestCache.get() method as:
> 
>   @ThreadRestricted("holds HarvestCache.this.getDatabase()")
> 
> The client is then responsible for synchronizing until finished with
> the result set.  Here's an example:
> http://zelea.com/project/votorola/_/javadoc/votorola/a/count/CountTable.Poll
> View.html#getByIndeces%28%29

Ok, yes. I have missed that note. I have just misused the ResultSet and this 
has nothing directly to do with the mentioned synchronization problem. My 
fault. I just stumbled upon it by hitting the closed ResultSet every 
time, which means that already every second query would is slowed down by the 
Database.class lock.

> 
> Another approach is to pass in a runner that does the client work:
> http://zelea.com/project/votorola/_/javadoc/votorola/a/count/CountTable.Poll
> View.html#runVoters%28java.lang.String,%20java.lang.String,%20votorola.a.cou
> nt.CountNodeW.Runner%29
> > ... We actually don't need to synchronize around Database for the
> > connection. (2) Using a single connection is maybe an unnecessary
> > limitation (in regard to code complexity) when using a pool of
> > connections and let them do synchronization.
> 
> Right.  Although we have to synchronize all use of a cached prepared
> statements, otherwise the driver is supposed to be thread safe.  It
> never used to be documented as such, but lately it is.  See the note:
> http://zelea.com/project/votorola/_/javadoc/src-html/votorola/g/sql/Database
> .html#line.15

Yes, but not around the db. You can also pool them, see below.

> 
> Currently it's fast enough, so we don't need to optimize.  Later we
> could remove the unecessary synchronization.  That would be step 1.
> 
> Step 2 might be connection pooling, as you say.  But that won't help
> with cached statements, which must *still* be synchronized.  Right?

Yes. But I only had this problem so reliably because all db access is 
synchronized around a single Database object, which defeats any multithreading 
when it comes to the db. I assume at least for the WAPs and the counting 
routines this makes our code single-threaded if the DB is the bottleneck 
(which it will be imo definetly once it writes to the disk, since all writes 
block all reads to any table and vice versa, HarvestWAP and its kick pendant 
already fulfill this criterium). This is also not recommended, Tomcat ships a 
basic set of commons-dbcp for this reason. (1)

No, the statements have a pool.(2) We don't need to cache a single statement 
per sql-query, the pool will do as long as we use the '?' to parameterize the 
statements in a common way as we already do.
All we need to do is use the PoolablePreparedStatement which we acquire and 
set the sql query-string each time, the pool will then automatically reload or 
create the PreparedStatement (which is immediatly prepared in the db for 
execution and can be repeated as we both know) like you do atm. by hand with a 
HashMap in Database.class. 
We need to close this statement though, which makes it transparently go back 
to the pool, dito for the pseudo-closed connection. We don't need to 
synchronize around Connection, PreparedStatement or ResultSet, we just might 
not close (or reuse, which is difficult in this setup and I could have used 
the ResultSet without it being closed) any of them before the ResultSet is 
finished as far as I understand the current state of the JDCB-API. I have read 
that on the interwebs that previous version of JDBC drivers were not 
threadsafe, but this is claimed to be no longer the case and Postgres is a 
very nice peace of technology imho, so I don't think we do something untested 
here. 

Prepared can also mean on the dbms side here, which is also a postgres option 
and could even be combined. (3)  This makes the meaning of prepared-statement 
a bit confusing.

> So we'd have to choose the most effective optimization, caching or
> async execution (pooling) or something else, depending on the specific
> bottleneck we're trying to clear.  (Optimization should be done in
> specific contexts with a profiler attached and the measurements, old
> and new, documented in the code.)

Well this is not exclusive. In fact we can use several connection objects and 
caching of statements or synchronization around result set. The problem that I 
see is that all connections to WAP will be effectively serialized with all 
other database access in the servlet jvm instance. Basically this defeats any 
multithreading for HarvestWAP (e.g. different servlet requests and parallel 
responses). I agree with profiling, but I have already measured the problem in 
my bug accidentially.

I think we already have a bottle neck with the WAPs and counting. It is 
unrecommended to synchronize that way and will as I tried to outline hurt the 
performance of the HarvestWAP service as well as any other service using the 
db more or less (because one of the will cause the bottleneck in some query 
and all other services using the db follow). The problem I also fear is that 
it covers our own synchronization bugs by synchronizing around something slow 
and global. We have to get it right anyway at least roughly and I don't think 
it will be more than 50 lines of code changes to votorola.g.sql.Database and 
some of the mentioned adjustments to the *Table classes. From then on only our 
code and queries determine scalability. (4) I can have a look into it once I 
am finished with the first version of the TalkTrack, because it effects 
HarvestWAP.

c

(1) https://tomcat.apache.org/tomcat-7.0-doc/jndi-datasource-examples-
howto.html#Non-DBCP_Solutions
(2) 
https://commons.apache.org/dbcp/api-1.4/org/apache/commons/dbcp/PoolingConnection.html
https://commons.apache.org/pool/
(3) http://jdbc.postgresql.org/documentation/head/server-prepare.html
(4) https://lwn.net/Articles/497069/ see also the json functions and embedded 
js



More information about the Votorola mailing list