Primarily Vinod's techno babble and secondarily his rants and hopefully some useful insights. A medium intended for learning by sharing experiences to hone programming skills and hopefully also network with like-minded enthusiasts.

Tuesday, July 12, 2016

Hibernate OGM, OMG!

Last month I was working on a project which involved writing a batch-processing component. Having decided that MongoDB would be a good tool given my workload requirements I wrote the solution in Java and used a mongo-java driver to talk to the NoSQL datastore.

The going was smooth until I ran into a problem with a read query that would not work. Spent a day with the problem and finally with some help from stack-overflow was able to get over the problem. Click here if you want to know more about the issue I had faced and the resolution for the same. Midway through the struggle I was getting ready to switch my datastore to an RDBMS though I was skeptical if it work, considering that I did not need the ACIDs that go with an RDBMS. I was firing read queries at the rate of 500-700 per second and a document-based NoSQL or a key-value Redis style one would be ideal. But nevertheless I did try out MySQL. Wrote a DAO to abstract the persistence logic. As suspected, MySQL was way too slow and was not good enough for my querying throughout needs. Choosing MongoDB initially was indeed the right thing I had done.

Having got the solution to crunch the data at a very healthy pace (~15GiB under 30 mins on a 24GB RAM, 32 core core server at 1.6Ghz), now I wanted the solution to be more extensible. So decided to check out if there was any ORM equivalents in the No-SQL space. Indeed Hibernate themselves had a nice one. Its Hibernate OGM, Object-Graph Mapping and it supported JPA! Wow, that implied if I could write my queries in JP QL, I would not have to learn to write queries native to individual NoSQL datastores. All enthused by this option to do less work (than learn all the myriad querying syntax for various persistence stores), I started playing with the OGM. Within a couple of hours, I was able to persist the document into MongoDB. Now that is the power of standards and the benefits of tools complying to it! How many hours we developers spend trying to do the same stuff with different tools across different projects? But for some issues with dates, I could have gotten it to work under an hour. I always seem to suffer with my dates (proof here), no pun intended!

Perked up by the fact that the document insertion into MongoDB took such little time, I was gearing up to implement the reads, updates and deletes involved. And that is when the cookie crumbled. One of the read queries checked if an attribute was null and doing that necessitated the use of the idiom IS NULL. Here is the culprit:

SELECT p FROM pppoe_test p where p.sourceIP=:source_ip and p.login>:logentrytime and (p.logout>:logentrytime OR p.logout IS NULL)
The MongoDBBasedQueryParserService very respectfully informed me that it had 'no viable alternative at input NULL'. Uh oh, that language in the error message did not sound any bit encouraging. Some more googling and then after posting the issue on the ever helpful stackoverflow it appeared that I might have run into a bug. So dutifully headed into the Hibernate OGM forum and have been waiting for a reply since. Not having gotten any response since the 2 days I posted it, I have filed an issue in their JIRA. Hopefully, I would hear back from them either way. 

I debugged the code and realized that the AST (Abstract Syntax Tree) object was being constructed by the parser was the issue. Hmm, quite often the category of problems that we face involve identifying the problem where the fix takes comparatively lesser time. But this one seemed daunting and hence challenging. I am so tempted to dig further into it, but then for mortals like me there are always other more pressing tasks at work requiring attention. It would be awesome if after reading this you get inspired to try and fix the issue! Let me know and who knows, we might get to fix it after all in the best case or at least learn some more stuff in the process which will make us better at our trade!

Monday, July 4, 2016

Do you know to handle your dates?

Its been quite a long time since I blogged. Been thinking for a awhile that I should restart it. So here comes a tit-bit thanks to me recently dating, ahem I meant dealing with the ubiquitous SimpleDateFormat class in Java.

All of us would have invariably used this handy SimpleDateFormat class from the JDK to set up a proper date, ugh java.util.Date you minion, what else can you do with it? While developing webapps, since every request invariably gets its own thread, this class is pretty safe to indulge with.
My date with the SimpleDateFormat class that had always been a pleasant affair until then, suddenly became a trying relationship between us. I squarely blame it on my not understanding dates properly. Well at least its thread-unsafe nature was something I was not aware of.
So while burning the proverbial mid-night oil trying to crunch log files of the order of a couple of dozen GBs a day, I was spawning few tens of threads. As part of the processing, I was trying to convert a java.util.Date to JSON date format string. I was using a static formatter object for formatting and then all hell broke loose. Some of my dates were getting messed up while most were just fine.

My initial suspicion was around the library that I was using for that purpose. So used another JSON library and again ended up with the same messed up date thingy once in a while. That incorrect suspicion cost me half a day. It was only then that I wondered if the hitherto-assumed-to-be-benign SimpleDateFormat could be the culprit. Google promptly confirmed my suspicion that it was indeed thread-unsafe! Was wondering why some JDK classes did not support thread-safety by default.

Having identified the problem, the fix was quite simple. Either I could synchronize the static formatting method that I was using or I could make the formatter a member of the Runnable itself. I chose the latter as I did not want the formatting method to be a bottleneck in the heavy-duty processing that I was doing. Problem solved after wasting a day.
If you could use Java 8, yet another option was to use the DateTimeFormatter class, which is thread-safe.

Here is the code snippet that caused the trouble in a multi-threaded environment.

public static SimpleDateFormat sdf = new SimpleDateFormat("MMM dd HH:mm:ss z yyyy");
public static String formatDate(Date inputdate) {
String dt = sdf.format(inputdate);
return dt;
}

Here is a link on the same topic on SO where the problem is discussed.

Lesson #1 : when using a library in a multi-threaded environment DO NOT assume thread-safety!
Lesson #2 : RTFM instead of making assumptions