WAP problem
conseo
4consensus at web.de
Tue Sep 18 23:06:47 EDT 2012
Hey Mike,
>
> > ... The problem is using PreparedStatementCache as the ResultSet
> > depends on the statement. (1) Both queries (and clients) work when I
> > create a new PreparedStatement for each query. This cannot be solved
> > by synchronization if the ResultSet object leaves it, which it does
> > here. ...
>
> If you want, you can mark your HarvestCache.get() method as:
>
> @ThreadRestricted("holds HarvestCache.this.getDatabase()")
>
> The client is then responsible for synchronizing until finished with
> the result set. Here's an example:
> http://zelea.com/project/votorola/_/javadoc/votorola/a/count/CountTable.Poll
> View.html#getByIndeces%28%29
Ok, yes. I have missed that note. I have just misused the ResultSet and this
has nothing directly to do with the mentioned synchronization problem. My
fault. I just stumbled upon it by hitting the closed ResultSet every
time, which means that already every second query would is slowed down by the
Database.class lock.
>
> Another approach is to pass in a runner that does the client work:
> http://zelea.com/project/votorola/_/javadoc/votorola/a/count/CountTable.Poll
> View.html#runVoters%28java.lang.String,%20java.lang.String,%20votorola.a.cou
> nt.CountNodeW.Runner%29
> > ... We actually don't need to synchronize around Database for the
> > connection. (2) Using a single connection is maybe an unnecessary
> > limitation (in regard to code complexity) when using a pool of
> > connections and let them do synchronization.
>
> Right. Although we have to synchronize all use of a cached prepared
> statements, otherwise the driver is supposed to be thread safe. It
> never used to be documented as such, but lately it is. See the note:
> http://zelea.com/project/votorola/_/javadoc/src-html/votorola/g/sql/Database
> .html#line.15
Yes, but not around the db. You can also pool them, see below.
>
> Currently it's fast enough, so we don't need to optimize. Later we
> could remove the unecessary synchronization. That would be step 1.
>
> Step 2 might be connection pooling, as you say. But that won't help
> with cached statements, which must *still* be synchronized. Right?
Yes. But I only had this problem so reliably because all db access is
synchronized around a single Database object, which defeats any multithreading
when it comes to the db. I assume at least for the WAPs and the counting
routines this makes our code single-threaded if the DB is the bottleneck
(which it will be imo definetly once it writes to the disk, since all writes
block all reads to any table and vice versa, HarvestWAP and its kick pendant
already fulfill this criterium). This is also not recommended, Tomcat ships a
basic set of commons-dbcp for this reason. (1)
No, the statements have a pool.(2) We don't need to cache a single statement
per sql-query, the pool will do as long as we use the '?' to parameterize the
statements in a common way as we already do.
All we need to do is use the PoolablePreparedStatement which we acquire and
set the sql query-string each time, the pool will then automatically reload or
create the PreparedStatement (which is immediatly prepared in the db for
execution and can be repeated as we both know) like you do atm. by hand with a
HashMap in Database.class.
We need to close this statement though, which makes it transparently go back
to the pool, dito for the pseudo-closed connection. We don't need to
synchronize around Connection, PreparedStatement or ResultSet, we just might
not close (or reuse, which is difficult in this setup and I could have used
the ResultSet without it being closed) any of them before the ResultSet is
finished as far as I understand the current state of the JDCB-API. I have read
that on the interwebs that previous version of JDBC drivers were not
threadsafe, but this is claimed to be no longer the case and Postgres is a
very nice peace of technology imho, so I don't think we do something untested
here.
Prepared can also mean on the dbms side here, which is also a postgres option
and could even be combined. (3) This makes the meaning of prepared-statement
a bit confusing.
> So we'd have to choose the most effective optimization, caching or
> async execution (pooling) or something else, depending on the specific
> bottleneck we're trying to clear. (Optimization should be done in
> specific contexts with a profiler attached and the measurements, old
> and new, documented in the code.)
Well this is not exclusive. In fact we can use several connection objects and
caching of statements or synchronization around result set. The problem that I
see is that all connections to WAP will be effectively serialized with all
other database access in the servlet jvm instance. Basically this defeats any
multithreading for HarvestWAP (e.g. different servlet requests and parallel
responses). I agree with profiling, but I have already measured the problem in
my bug accidentially.
I think we already have a bottle neck with the WAPs and counting. It is
unrecommended to synchronize that way and will as I tried to outline hurt the
performance of the HarvestWAP service as well as any other service using the
db more or less (because one of the will cause the bottleneck in some query
and all other services using the db follow). The problem I also fear is that
it covers our own synchronization bugs by synchronizing around something slow
and global. We have to get it right anyway at least roughly and I don't think
it will be more than 50 lines of code changes to votorola.g.sql.Database and
some of the mentioned adjustments to the *Table classes. From then on only our
code and queries determine scalability. (4) I can have a look into it once I
am finished with the first version of the TalkTrack, because it effects
HarvestWAP.
c
(1) https://tomcat.apache.org/tomcat-7.0-doc/jndi-datasource-examples-
howto.html#Non-DBCP_Solutions
(2)
https://commons.apache.org/dbcp/api-1.4/org/apache/commons/dbcp/PoolingConnection.html
https://commons.apache.org/pool/
(3) http://jdbc.postgresql.org/documentation/head/server-prepare.html
(4) https://lwn.net/Articles/497069/ see also the json functions and embedded
js
More information about the Votorola
mailing list