Last month I was working on a project which involved writing a batch-processing component. Having decided that MongoDB would be a good tool given my workload requirements I wrote the solution in Java and used a mongo-java driver to talk to the NoSQL datastore.
The going was smooth until I ran into a problem with a read query that would not work. Spent a day with the problem and finally with some help from stack-overflow was able to get over the problem. Click here if you want to know more about the issue I had faced and the resolution for the same. Midway through the struggle I was getting ready to switch my datastore to an RDBMS though I was skeptical if it work, considering that I did not need the ACIDs that go with an RDBMS. I was firing read queries at the rate of 500-700 per second and a document-based NoSQL or a key-value Redis style one would be ideal. But nevertheless I did try out MySQL. Wrote a DAO to abstract the persistence logic. As suspected, MySQL was way too slow and was not good enough for my querying throughout needs. Choosing MongoDB initially was indeed the right thing I had done.
Having got the solution to crunch the data at a very healthy pace (~15GiB under 30 mins on a 24GB RAM, 32 core core server at 1.6Ghz), now I wanted the solution to be more extensible. So decided to check out if there was any ORM equivalents in the No-SQL space. Indeed Hibernate themselves had a nice one. Its Hibernate OGM, Object-Graph Mapping and it supported JPA! Wow, that implied if I could write my queries in JP QL, I would not have to learn to write queries native to individual NoSQL datastores. All enthused by this option to do less work (than learn all the myriad querying syntax for various persistence stores), I started playing with the OGM. Within a couple of hours, I was able to persist the document into MongoDB. Now that is the power of standards and the benefits of tools complying to it! How many hours we developers spend trying to do the same stuff with different tools across different projects? But for some issues with dates, I could have gotten it to work under an hour. I always seem to suffer with my dates (proof here), no pun intended!
Perked up by the fact that the document insertion into MongoDB took such little time, I was gearing up to implement the reads, updates and deletes involved. And that is when the cookie crumbled. One of the read queries checked if an attribute was null and doing that necessitated the use of the idiom IS NULL. Here is the culprit:
The MongoDBBasedQueryParserService very respectfully informed me that it had 'no viable alternative at input NULL'. Uh oh, that language in the error message did not sound any bit encouraging. Some more googling and then after posting the issue on the ever helpful stackoverflow it appeared that I might have run into a bug. So dutifully headed into the Hibernate OGM forum and have been waiting for a reply since. Not having gotten any response since the 2 days I posted it, I have filed an issue in their JIRA. Hopefully, I would hear back from them either way.
I debugged the code and realized that the AST (Abstract Syntax Tree) object was being constructed by the parser was the issue. Hmm, quite often the category of problems that we face involve identifying the problem where the fix takes comparatively lesser time. But this one seemed daunting and hence challenging. I am so tempted to dig further into it, but then for mortals like me there are always other more pressing tasks at work requiring attention. It would be awesome if after reading this you get inspired to try and fix the issue! Let me know and who knows, we might get to fix it after all in the best case or at least learn some more stuff in the process which will make us better at our trade!
The going was smooth until I ran into a problem with a read query that would not work. Spent a day with the problem and finally with some help from stack-overflow was able to get over the problem. Click here if you want to know more about the issue I had faced and the resolution for the same. Midway through the struggle I was getting ready to switch my datastore to an RDBMS though I was skeptical if it work, considering that I did not need the ACIDs that go with an RDBMS. I was firing read queries at the rate of 500-700 per second and a document-based NoSQL or a key-value Redis style one would be ideal. But nevertheless I did try out MySQL. Wrote a DAO to abstract the persistence logic. As suspected, MySQL was way too slow and was not good enough for my querying throughout needs. Choosing MongoDB initially was indeed the right thing I had done.
Having got the solution to crunch the data at a very healthy pace (~15GiB under 30 mins on a 24GB RAM, 32 core core server at 1.6Ghz), now I wanted the solution to be more extensible. So decided to check out if there was any ORM equivalents in the No-SQL space. Indeed Hibernate themselves had a nice one. Its Hibernate OGM, Object-Graph Mapping and it supported JPA! Wow, that implied if I could write my queries in JP QL, I would not have to learn to write queries native to individual NoSQL datastores. All enthused by this option to do less work (than learn all the myriad querying syntax for various persistence stores), I started playing with the OGM. Within a couple of hours, I was able to persist the document into MongoDB. Now that is the power of standards and the benefits of tools complying to it! How many hours we developers spend trying to do the same stuff with different tools across different projects? But for some issues with dates, I could have gotten it to work under an hour. I always seem to suffer with my dates (proof here), no pun intended!
Perked up by the fact that the document insertion into MongoDB took such little time, I was gearing up to implement the reads, updates and deletes involved. And that is when the cookie crumbled. One of the read queries checked if an attribute was null and doing that necessitated the use of the idiom
SELECT p FROM pppoe_test p where p.sourceIP=:source_ip and p.login>:logentrytime and (p.logout>:logentrytime OR p.logout IS NULL)
I debugged the code and realized that the AST (Abstract Syntax Tree) object was being constructed by the parser was the issue. Hmm, quite often the category of problems that we face involve identifying the problem where the fix takes comparatively lesser time. But this one seemed daunting and hence challenging. I am so tempted to dig further into it, but then for mortals like me there are always other more pressing tasks at work requiring attention. It would be awesome if after reading this you get inspired to try and fix the issue! Let me know and who knows, we might get to fix it after all in the best case or at least learn some more stuff in the process which will make us better at our trade!
Thanks Vinod for all this detail. Truth is we are working on a new parser that better accommodate both ORM and OGM. And will give us the flexibility to write better SQL queries (shorter and hence less network intensive).
ReplyDeleteWe will try and look at your issue and fix it. Hopefully that would be a big hole to dig, we are mortals too ;)
Emmanuel for the Hibernate OGM team.
Thanks Emmanuel for the response. Apparently it wasnt even a hole, may be just a minor crack which had a work-around :-).
DeleteAs for the mortals thing, that is the only certainty in life! What I meant was folks like me whose job at office does not allow much freedom or time to contribute to opensource. Would be glad to though in some way.
Hi Vinod,
ReplyDeleteSo the issue is this parser is case sensitive for 'null', 'false' and 'true'. Just change your "NULL" to "null" and it will work correctly.
Working on a fix.
Thanks for your feedback!
Cool that worked! Thanks Guillaume. Ive updated the ticket with this info you had mentioned https://hibernate.atlassian.net/browse/OGM-1118
DeleteI pushed a PR to the hqlparser project. It will be fixed in the next version of OGM. Thanks for taking the time to report this issue!
Delete