Vinod's techno babble, rants and opinions: 2010

Tuesday, August 17, 2010

When to use Java's Reader and when to rely on Streams?

Let me start with a guess. You are here because of one of the following reasons:

you follow my blog posts
there is a bug in Google's page rank mechanism for some search terms and you landed here. Was the search term 'encoding' or 'multi-bytes' or 'xml' or 'inputstream'?

This post's primary aim is to ensure that some knowledge sinks into my brain. I've this undesirable trait of forgetting solutions to problems I had faced earlier and fixed. The solution and at times the problem too seems to effectively escape from my brain after a short span of time (volatile RAM inside). So this entry is to etch the learning in my memory. If not it would at least serve as a ready reckoner to look up for solutions to problems that I've seen in my past.

So whats the big deal with multi-byte characters? Seriously, I dont know. All I know is that there are quite a few varieties of character set encodings out there. They propped up to fix some problem or the other with the various characters that are part of a plethora of languages that are in use today. An important piece of info that I had learned somehow was the existence of an character encoding set standard called 'UTF-8' which is recommended as a kind of superset of encodings.

Here is the problem:

There is a http service that needs to be invoked. You are given a URL (with the parameters required on the query string) to invoke the service. The response will be XML stream which will be well-formed and valid. The XML needs to be parsed and some java objects that are already there in the system need to be populated with the parsed xml data.

Now the solution that was adopted:

Used Apache commons-httpclient library's HttpPost and HttpClient to post the http request via the endpoint specified. You could use Sun's URLConnection instead if you fancy that. A Stax parser would be fine for the job, but the output xml had to be physically saved for a cron, which would process the XML file later. The InputStream as normally recommended was wrapped inside a BufferedInputStream. Data was read from the buffered stream and written to a FileOutputStream via an OutputStreamWriter. Xml was written out via the write(int) method. Apache's digester library was used to parse the xml and populate the java objects later for the cron. Here is the code snippet which does what was described above.

InputStream in = [API Call](url, params);     
BufferedInputStream bis = new BufferedInputStream(in);
FileOutputStream fos = new FileOutputStream(filename);
OutputStreamWriter outWriter = new OutputStreamWriter(fos, "UTF-8");
int numRead;
while ((numRead = bis.read()) != -1) {
  outWriter.write(numRead);
}

There was nothing wrong with the solution, but due to one implementation ignorance the solution did not work properly for multi-byte characters. The service's XML output document had explicitly specified encoding on it as 'UTF-8'. So when the service was accessed via the browser the data was being rendered by the user-agent properly. But when I opened the xml file on my disk through the same browser, some characters would come up garbled. I inferred that there was some error while writing out the xml from the response stream into the file. I remembered running into this same problem a few years back, but could not recollect what I had done then to fix it.

The encodings on the streams rightly is 'UTF-8'. It was puzzling why the code would not work well then for multi-byte strings. I was reading from the properly encoded stream via the read() method and writing that byte out to another properly encoded stream. It dawned on me to investigate the implementation of BufferedInputStream's read() method. The javadoc screamed that the method by default would read only byte-by-byte from the stream. Bingo! that seemed to be the cause of the problem when a single character was coming in through a double or multi byte representation. When I write out a byte, I was writing out only a part of that multibyte character and not its complete representation. Having identified the root cause of the problem, it was easier to fix it. Simply chunked the reading to consider many bytes at a time. I thought about my chunk size for a while. What value should I use? With some googling it seemed that UTF-8 could have chars that are between 1-4 bytes long. So 1*2*3*4 = 24 should be a good byte array size to start with. Any multiple of 24 should be fine I suppose. I set my chunking limit accordingly and then I converted that byte[] into a properly encoded String and sent that String to the writer to spit out the XML. Voila! it worked.

But I was not very convinced that the solution was exact. What if my chunk is such that a character with multibyte representation gets split into more than one chunk? Some more searching on the internet and I realized that while dealing with character streams, a BufferedReader is the most suitable class for the job. Streams are meant for binary data.
If only I had remembered this solution a few hours earlier, this post might not have fructified at all!
Anyways the modified code snippet that used a reader is below:

InputStream in = [API Call](url, params); 
FileOutputStream fos = new FileOutputStream(filename);
OutputStreamWriter osw = new OutputStreamWriter(fos, "UTF-8");
BufferedReader br = new BufferedReader(new InputStreamReader(in, "UTF-8"));
String fileline;
while((fileline = br.readLine()) != null) {
  osw.write(fileline);
}

I wonder how the reader classes have the intelligence to read character-by-character given the encoding set. Will definitely look that up sometime in the jdk ... hopefully ;-)

Saturday, July 24, 2010

Having ANT talk to SVN for sophisticated Unit Testing

Recently I encountered a situation where there was a requirement to execute only newly written JUnit tests for the product. For some inexplicable reasons the team had gotten into an undesirable habit of not unit testing. Some of the older JUnit tests that were written a few years back by a different set of people were broken or irrelevant and nobody in the team had the time or inclination to go through the old ones and understand them, leave alone fixing them. That is pretty much an understandable human characteristic? After all, how many times have you wanted to take a closer look at code that you might have cranked up an year earlier?
Another reason for wanting to discard the older tests was the need to measure the effectiveness of the new processes the team had adopted on the code quality. I am thinking it would be a nice idea to also track the effectiveness of these new changes on a release-by-release basis. If the code-quality trend was improving, then we would know for sure by some quantitative measure, the effect of the process changes that the team had embarked on. Let me make a guess, you are now wondering what new changes we were doing to our style of working.

For all newly added classes all public methods will need to have unit tests
For any modified code if the method signature was public, a new test would be written or if there was a pre-existing test, it would be corrected to be meaningful in the current context of the method in that class
Code reviews by senior engineers in the team who really know they stuff. The intention was to cover at least the services that were being exposed. The UI and controller layer code are being given a lower priority for now. That decision will probably be revisited in the future iterations.
Create a more harmonious environment of learning and competing by honing one's programming skills. This seems to be the toughest challenge!

Having decided on the 'what' to measure part, the next obvious step was to figure out the 'how' part. I identified that given a time line range for a release, I needed to know

all the modified classes
all the newly added classes
all the newly created tests
all the modified tests

Armed with these information, I planned to execute only those junit tests that were of interest to me. Lets say I wanted to run only the newly created tests so that the JUnit report is not skewed by the failures of pre-existing tests. Now this would help me in finding out if there was indeed a relationship between the quantum of unit tests newly written and the bugs per KLOC across previous releases and the current one. With a positive correlation (more tests --> less bugs from QA), you might think that one of the results of earlier sloppy code was getting fixed. You do not have to be an Einstein to figure out the 'why' part now - a happy set of developers (due to lesser bugs), a happier set of managers (more productive team due to lesser bugs and better code quality), And happiest me! (cuz I would have gotten the numbers to prove the claims from the dev side).

How do we identify all the newly added test classes? Enter SvnKit, the Java library that can talk to the SVN source control system. The SvnKit Wiki pages were quite helpful with their examples for me to quickly understand the API usage. Initially I ran into a problem when I tried using the custom "svn://" protocol to access the SVN repository. The error message that was being returned offered no help whatsoever in identifying the cause of the problem. It spewed pure gibberish as far as my grey cells could fathom. Thanks to my very knowledgeable colleague Ganeshji, the problem was quickly identified as a jar version problem. The SvnKit jars that I had used were old and could not work with the more recent SVN client that I had on my machine. With upgraded SvnKit version jars, I wrote a sort of SVN connector class, SVNConnector.java to ferret out the above 4 lists that I was interested in. The API provided a straight-forward way to access a file's latest revision details, which would also include the last modified time. So getting at the list of modified classes and tests was a cake-walk. The tricky part was in determining if the file was added during our iteration period (within a specified date range). Again the svnkit wiki came to my rescue. I got an idea from one of the samples that showed how to traverse the repository tree and that worked. The current hack I have am pretty sure is inefficient and slow. But for lack of ideas/expertise with the API, I settled for the hack to first get the job done.

The next step was to use the list of newly added tests from the SVNConnector and feed it to the ANT task that would execute the tests. Now to do that I had to make ANT talk to my list. That was not as straight-forward as I had initially imagined. I had planned to write my own ANT Task, but I figured out that I needed a specialized ANT FileSelector and not a custom ANT task itself. Wrote one and on the first run, as expected it broke. There was a problem in the way I was trying to pass in values to my custom FileSelector through separate nested param tag elements. That was incorrect, it had to be passed in as element's attributes instead. Next I ran into a classpath problem. My SVNConnector class was not visible to ANT. I cursed scripting and its many followers. Finally figured out how to specify the path properly and then my connector was being executed via my custom FileSelector. But the first error free run was not ending at all and I had to kill the process. Wanted to debug and find out what was happening. To my utter dismay after a couple of hours of googling, I gave up trying to set up the debugger for debugging an ANT executed class. It seems to be due to ANT starting the program on a different VM and not the one in which eclipse was running. In my futile attempts I tried getting ANT to spawn a new jvm on a remote port that I could point my eclipse debugger to. Hard luck or I missed something. Nevertheless printing to console was the improvised debugging technology that I had to fall back on. Despite employing such impoverished resourceless-ness I finally managed to have the custom FileSelector work just as I wanted.
It has been a while since I had that satisfaction of creating something not done before. May be someone already has done something on this front before, if so Google has turned a blind eye to such information on the web and hopefully all you folks reading this will make the search engines take notice of this page.

The code (the connector and the ANT custom FileSelector) is here. No worries, no IP violation, no license, zilch.
Disclaimer : Use it at your own risk

Wednesday, March 3, 2010

Analyzing a web application for its user experience quality

So you did believe after all that it might be worthwhile to meander to my lair on the net and check out whats on my mind. Thanks for the confidence and I shall try to keep your grey matter engaged.
I intend to use this blogging medium for primarily 2 reasons:

Sharing my learnings across the different consultancy assignments that I undertake so that I might attract some intelligent folks who could add to my understanding of the topic. I firmly believe that the collective wisdom of the public will any day be superior compared to my incisive analysis of any topic that I might fancy. It might also help someone else with a similar task and/or it might just serve the purpose of making aware some hapless soul browsing for some info to avoid possible usability pitfalls in his/her web application.
As a means of consolidating some of my observations. I've noticed that when I write, my brain does does extra processing to sort out the pieces of the jumble better and look at the essential picture thats at the crux of the model. Basically I am consolidating my knowledge and in the process doing myself a favor :-)

OK that should suffice for a pre-introduction ;-)

Being a software consultant one of my recent assignments was to assess the usability for a suite of web applications related to the health care domain. There were a secondary goal that was related to performance for some pages in some apps of this suite. These apps were meant to be used by the drug manufacturers to publish their drug details which could be used by the physicians as their electronic and/or paper reference.
I had never before cared to observe with a keen eye for usability issues in the software that I work with and I was quite surprised at the quantum of issues that struck me as I went about my analysis. Listing all of those will make this post unwieldy. So in the interest of brevity (guys I value and respect your time, hence the strategy towards terseness) I am going to list only those deficiencies in the suite which I perceive as the bigger demons that need to be exorcised.

Each of these small apps were developed as silos. You might wonder why? Umm, thats quite normal in our software development industry after all, where work gets outsourced to multiple vendors. Each vendor/team gets to work on a part of the solution. Now thats reasonable and normally should not be a cause for concern. The problem crops up when the product development services vendor ends up treating the requirements spec as the bible and does not try to understand what problem is being solved. And therein lies the biggest reason for a solution to fail. Not asking 'What problem is being solved?' is probably the biggest mistake that was made in this case. Some vendors probably have the notion that going back to the client is likely to be perceived as a general lack of smartness by the client (blame it on cultural influences if you want to be diplomatic) and hence desist asking too many questions. And to add to that, some clients (their Business Analysts or Product/Solution Manager) do not like to be bothered once the requirements specs are handed out.
The reasoning for my conclusion is simple: When I counted the number of apps in the suite and it exceeded half a dozen,and thought about the problem that these apps were trying to fix, I was like 'WTF are some many apps needed in the first place'? All it needed was some internet facing functionalities (for manufacturers to add their new products to the compendium of drugs that is being maintained by the system) that triggered the data input to the system and which gets augmented with more metadata and data on the intranet side to support some basic workflow before it is persisted in the designed format for final consumption. Its not so difficult to visualize redundancies in multiple areas of the domain model, like manufacturers, drugs, co-ordinators, etc. which were being duplicated across these apps. User-management, logging, styling and such cross-cutting functions were all over the place due to the lack of a plan to handle such components of the system in a centralized fashion across the applications of the suite.

The next important observation I had was the needless screen complexity on some of the apps. Clutter on the screen is never welcome and in these cases there was an overload of info on the screen. One particular screen epitomized the violation of some very common UI guidelines that are recommended for better usability.

Presence of multiple horizontal and vertical scroll bars in a single screen. Forget talking in the singular here though that itself needs to be shunned.
Profusion of drop-downs that enable the-revenge-of-the-mice. Thanks to the innocent looking mouse scroll wheel!
Visited links would not be remembered
Pagination elements that were getting hidden due to the horizontal scroll as these elements were in the bottom right corner area of the grid
Performance issues due to the huge amount of data being pulled in (~1.3MB in production)
The notion of a grid view for multiple instances was taken to its extreme where all the columns for the row were being shown at once (irrespective of the user role) and a single data grid was showing up more than 300 records per page before pagination kicks in and there were about 20 odd columns in one of the grids. I wonder what justified such a huge page size. There were 3 such different grids on the screen!
The table column headers would not scroll horizontally, but the data would. So it would lead to a mangled UI where it was painful to see what value of the row was for which column.
There was no link presented on the records of the grids to open their detail pages! wait the irony was, such a details page existed already! Only that the linking between the 2 pages was not done and I shudder at what motivation could have lead to such a design :-(
And yet another irritant was the support to perform some actions on this grid table record directly. Icons were shown for all records irrespective of whether that action was valid for that record. Oh yes thank God, error handling was there to save the day.
The craziest part is that some actions on the record could alter its state thereby making it incorrect to show the record anymore in that grid. So to overcome that, all of these actions were posting data back to the server and the entire page with the multiple grids was getting reloaded for every such action on the record. No points for guessing the severe performance degradation. The server was sending more than a MB of data every time the page reloaded and this meant several tens of MBs per user very easily per typical session. Consequently some users reported that the page never rendered completely even after a few minutes and they finally got a time-out error.
And here is the final punch, this page happens to be the landing page for the app.

There probably were some more such bloopers but I am not able to recount them all now. Yes, the couple of weeks that the assignment spanned would probably be the time when I sighed the most in my life, having to deal with the UI-cum-performance-cum-design-mess that the most important screen in that app was in. Unfortunately, answers could not be found for questions I wanted to ask to understand if there was any method behind this madness (the product had gotten sold to another company and thats how the current owners possessed these apps).

My conclusion was this - Don't just implement feature requests from "user representatives" or "business analysts". I am pretty sure nobody bothered to see how the apps were being used at all as the apps were being built. My hunch is: end-users tell what they want, but they probably do things differently from how they think they'll use the features. Now combine this with a scenario where the IT services vendor company does not question the requirements and believes that the requirements are frozen. Voila! isn't that a catastrophe in the making?

IMHO it is only by observing how the end-users navigate around the app, can the application screen's UI elements be re-factored for better ease of use. It is really really hard to visualize how the features will be used and thereby plan and build the most optimal usability in the first go. No wonder we developers are proponents of the Agile development methodology aren't we?

Honey I've a brainwave. As I worked my way through the different apps in the suite, my laziness inspired me with an idea. And I am going to state it here and request your feedback. Listening to your opinions might sharpen the idea and give it further form and direction after all. The idea is to come up with an application which will be able to do some primary usability analysis automatically based on some set of static usability rules and maybe metrics too. It can act as a first level usability check before further resources get committed towards usability aspects of the project. It could also serve in identifying possible problem areas that warrant focus.

A word of caution. The bane of IE 6's snail-like Javascript engine was so obvious in one screen which was excruciatingly slow in getting loaded. IE 6 just could not stand up to the task of heavy DOM handling via Javascript. (If you know of tweaks and tricks to kick up IE 6 to perform better, I'd be glad for the education!). This particular page would take more than 5 mins to render in IE 6 whereas in Firefox/Chrome would take less than 20secs. And you know what? the app was designed with IE 6 as the target browser.

Thats as much for my first post. I intend to show up regularly here, so look ahead for the next post folks.

Vinod's techno babble, rants and opinions

Tuesday, August 17, 2010

When to use Java's Reader and when to rely on Streams?

Saturday, July 24, 2010

Having ANT talk to SVN for sophisticated Unit Testing

Wednesday, March 3, 2010

Analyzing a web application for its user experience quality

About Me

Blog Archive

Followers

Labels