WAP problem

Michael Allan mike at zelea.com
Wed Sep 19 06:42:37 EDT 2012


> > So we'd have to choose the most effective optimization, caching or
> > async execution (pooling) or something else, depending on the
> > specific bottleneck we're trying to clear.  ...
> 
> Well this is not exclusive. In fact we can use several connection objects and 
> caching of statements or synchronization around result set. ...

You could be right.  I guess we need to dig (when the time comes) and
pin down the thread restricted object.  Is it the connection, prepared
statement, result set, or something else?  Currently we assume the
worst, which is the correct thing to do.

> ... The problem that I see is that all connections to WAP will be
> effectively serialized with all other database access in the servlet
> jvm instance. Basically this defeats any multithreading for
> HarvestWAP (e.g. different servlet requests and parallel
> responses). ...

Largely yes.  But this is not a real problem at the moment, and it can
be fixed later without too much disruption.
 
> > ... (Optimization should be done in specific contexts with a
> > profiler attached and the measurements, old and new, documented in
> > the code.)
>
> ... I agree with profiling, but I have already measured the problem
> in my bug accidentially. ...  I think we already have a bottle neck
> with the WAPs and counting. ...

Counting is now fixed.  It was slowing down my work on the polltrack,
so I fixed it.  We now count 500 votes/s on the server, 1000 on a fast
workstation, which is acceptable at this stage.  The optimization here
was different from what WAP will eventually need.  Count processes
(unlike web) are single threaded and write huge amounts of data in a
short period of time.

> ... I can have a look into it once I am finished with the first
> version of the TalkTrack, because it effects HarvestWAP.

You're welcome to C, although I recommend waiting till we have a real
problem with it, e.g. when services are actually too slow.  Meantime I
think the bigger danger is premature optimization:
https://www.google.com/search?q=%22premature+optimization%22

But if it interests you, please go ahead.

-- 
Michael Allan

Toronto, +1 416-699-9528
http://zelea.com/


conseo said:
> Hey Mike,
> 
> > 
> > > ... The problem is using PreparedStatementCache as the ResultSet
> > > depends on the statement. (1) Both queries (and clients) work when I
> > > create a new PreparedStatement for each query. This cannot be solved
> > > by synchronization if the ResultSet object leaves it, which it does
> > > here. ...
> > 
> > If you want, you can mark your HarvestCache.get() method as:
> > 
> >   @ThreadRestricted("holds HarvestCache.this.getDatabase()")
> > 
> > The client is then responsible for synchronizing until finished with
> > the result set.  Here's an example:
> > http://zelea.com/project/votorola/_/javadoc/votorola/a/count/CountTable.Poll
> > View.html#getByIndeces%28%29
> 
> Ok, yes. I have missed that note. I have just misused the ResultSet and this 
> has nothing directly to do with the mentioned synchronization problem. My 
> fault. I just stumbled upon it by hitting the closed ResultSet every 
> time, which means that already every second query would is slowed down by the 
> Database.class lock.
> 
> > 
> > Another approach is to pass in a runner that does the client work:
> > http://zelea.com/project/votorola/_/javadoc/votorola/a/count/CountTable.Poll
> > View.html#runVoters%28java.lang.String,%20java.lang.String,%20votorola.a.cou
> > nt.CountNodeW.Runner%29
> > > ... We actually don't need to synchronize around Database for the
> > > connection. (2) Using a single connection is maybe an unnecessary
> > > limitation (in regard to code complexity) when using a pool of
> > > connections and let them do synchronization.
> > 
> > Right.  Although we have to synchronize all use of a cached prepared
> > statements, otherwise the driver is supposed to be thread safe.  It
> > never used to be documented as such, but lately it is.  See the note:
> > http://zelea.com/project/votorola/_/javadoc/src-html/votorola/g/sql/Database
> > .html#line.15
> 
> Yes, but not around the db. You can also pool them, see below.
> 
> > 
> > Currently it's fast enough, so we don't need to optimize.  Later we
> > could remove the unecessary synchronization.  That would be step 1.
> > 
> > Step 2 might be connection pooling, as you say.  But that won't help
> > with cached statements, which must *still* be synchronized.  Right?
> 
> Yes. But I only had this problem so reliably because all db access is 
> synchronized around a single Database object, which defeats any multithreading 
> when it comes to the db. I assume at least for the WAPs and the counting 
> routines this makes our code single-threaded if the DB is the bottleneck 
> (which it will be imo definetly once it writes to the disk, since all writes 
> block all reads to any table and vice versa, HarvestWAP and its kick pendant 
> already fulfill this criterium). This is also not recommended, Tomcat ships a 
> basic set of commons-dbcp for this reason. (1)
> 
> No, the statements have a pool.(2) We don't need to cache a single statement 
> per sql-query, the pool will do as long as we use the '?' to parameterize the 
> statements in a common way as we already do.
> All we need to do is use the PoolablePreparedStatement which we acquire and 
> set the sql query-string each time, the pool will then automatically reload or 
> create the PreparedStatement (which is immediatly prepared in the db for 
> execution and can be repeated as we both know) like you do atm. by hand with a 
> HashMap in Database.class. 
> We need to close this statement though, which makes it transparently go back 
> to the pool, dito for the pseudo-closed connection. We don't need to 
> synchronize around Connection, PreparedStatement or ResultSet, we just might 
> not close (or reuse, which is difficult in this setup and I could have used 
> the ResultSet without it being closed) any of them before the ResultSet is 
> finished as far as I understand the current state of the JDCB-API. I have read 
> that on the interwebs that previous version of JDBC drivers were not 
> threadsafe, but this is claimed to be no longer the case and Postgres is a 
> very nice peace of technology imho, so I don't think we do something untested 
> here. 
> 
> Prepared can also mean on the dbms side here, which is also a postgres option 
> and could even be combined. (3)  This makes the meaning of prepared-statement 
> a bit confusing.
> 
> > So we'd have to choose the most effective optimization, caching or
> > async execution (pooling) or something else, depending on the specific
> > bottleneck we're trying to clear.  (Optimization should be done in
> > specific contexts with a profiler attached and the measurements, old
> > and new, documented in the code.)
> 
> Well this is not exclusive. In fact we can use several connection objects and 
> caching of statements or synchronization around result set. The problem that I 
> see is that all connections to WAP will be effectively serialized with all 
> other database access in the servlet jvm instance. Basically this defeats any 
> multithreading for HarvestWAP (e.g. different servlet requests and parallel 
> responses). I agree with profiling, but I have already measured the problem in 
> my bug accidentially.
> 
> I think we already have a bottle neck with the WAPs and counting. It is 
> unrecommended to synchronize that way and will as I tried to outline hurt the 
> performance of the HarvestWAP service as well as any other service using the 
> db more or less (because one of the will cause the bottleneck in some query 
> and all other services using the db follow). The problem I also fear is that 
> it covers our own synchronization bugs by synchronizing around something slow 
> and global. We have to get it right anyway at least roughly and I don't think 
> it will be more than 50 lines of code changes to votorola.g.sql.Database and 
> some of the mentioned adjustments to the *Table classes. From then on only our 
> code and queries determine scalability. (4) I can have a look into it once I 
> am finished with the first version of the TalkTrack, because it effects 
> HarvestWAP.
> 
> c
> 
> (1) https://tomcat.apache.org/tomcat-7.0-doc/jndi-datasource-examples-
> howto.html#Non-DBCP_Solutions
> (2) 
> https://commons.apache.org/dbcp/api-1.4/org/apache/commons/dbcp/PoolingConnection.html
> https://commons.apache.org/pool/
> (3) http://jdbc.postgresql.org/documentation/head/server-prepare.html
> (4) https://lwn.net/Articles/497069/ see also the json functions and embedded 
> js



More information about the Votorola mailing list