Large Data Sets (3/3)

Now that we have discussed paging and sorting let’s talk about getting the experience a little better for your user. Our first step is to add to the paging solution so that the DataProvider looks ahead when the user is approaching the end of a page. After that we’ll improve the performance of the DataGrid by turning off the “live scrolling” so that the DataGrid doesn’t try to update as the user is navigating. Finally we’ll build on that DataGrid to provide context information to the user so they can navigate more accurately. The examples can be found here and are a superset of what we did in articles 1 and 2.


Lookahead

The paging solution introduced in the first article is useful in many situations to reduce the amount of data put on the wire at once and allow the user to interact with your app sooner. However, it’s possible to improve things even more by loading a few pages even before the user even needs them. When the DataProvider notices that someone has requested an item within a certain threshold it will go ahead and load the next page. This means that by the time the user actually gets to needing that page it will already exist, having been brought down in the background. This solution is great when a user is moving through a list looking at a few records at a time sequentially. However, if the usage model is based on a lot of random access this technique may not be appropriate as you won’t gain much by downloading a page the user is unlikely to see. Example 6 shows the lookahead solution by using a new DataProvider implementation called LookaheadPagingDataProvider. Much of the code is the same as SimplePagingDataProvider, there’s just been some refactoring and the addition of lookahead logic. If you run the example you’ll see that I log when a page is loaded based on a Request (the user wants to view the data now) or if we’re loading based on Lookahead. Lookahead works for both ends of a page, if you’re near the top of a page we’ll look to load the previous one, the bottom looks for the next page.

Live Scrolling

One issue when using a DataGrid with lots of data is that it may be slow to scroll when using the scrollbar. Remember that every time the DataGrid wants to display data it calls getItemAt on the DataProvider, and if you’ve looked at the source code you can see that there’s a lot more logic than simply returning an element in an array. This means that there’s a lot going on as you move the scrollbar “thumb” around, and performance can degrade. If you’ve played with the Flex layout controls you might have noticed that the DividedBox classes have a property called “liveDragging.” When this is true the UI will attempt to re-layout everything as the divider is dragged around. When false, the UI will wait to re-layout until the divider has been released, making the dragging much faster. We can achieve noticeable performance gains if we apply this same logic to the DataGrid. Example 7 is essentially the same as Example 6 except that I’ve created a subclass of DataGrid (ingeniously called FasterDataGrid) that allows you to disable live scrolling. Note that this implementation is only really possible because I have access to the source code so I could see what I needed to do. It is probably not the best way to achieve this effect, but rest-assured we are looking into adding this capability into our controls in a future release so you don’t have to. With that in mind, please note that this is “unsupported” and the underlying code that’s relied on may change at any time. So consider yourself warned if you use this in your own code.

ToolTips

The last improvement that I want to make is strictly for usability. When you’re dealing with a large amount of data it can be difficult to know where you are in context to the rest of the data. If you’ve used recent versions of Microsoft Word you may have seen a feature where as you drag the scrollbar it shows you the page number and perhaps a heading to indicate where in the document you would land should you release the scrollbar there. I’ve tried to apply that idea to my last example. In our case we’ll take the field that was used for sorting and create a complete list of values for that field that can be displayed when the user is moving the scrollbar. Since we’re only trying to load one field we don’t have to worry about creating thousands of complex objects, in our case it’s just Strings. In my experiments I was able to bring down all 20,000 values for the sort keys in approximately 8 seconds. That’s a lot faster than bringing down complete records, especially if you consider that a user will take a few seconds to think and process what they’re seeing before moving the scrollbar. In order to allow user interaction while downloading I page the sorted key list, but I use a much larger page size (7500 worked OK for me on my machine). I also don’t wait for the user to need a page from this list, as soon as the first page loads we go ahead and request the next one. Example 8 shows the implementation with SortKeyProvider.as and ToolTipDataGrid.as being the important files. You may want to play with the SortKeyProvider pageSize attribute to see what works best for you as the interface does pause while the keys are being read in.

Conclusion

Many applications use a large amount of data and unfortunately there isn’t a magic bullet to making everything ready the moment the user wants it. A common “first-step” to working with large data is to retrieve it in pages. This allows the user to see data immediately and gives you the opportunity to download more in the background or simply wait until the user actually wants to see it. The user will often want to move around the data even if it is not all on the client so it is up to you as the developer to provide appropriate navigation metaphors. One simple example is putting up a tooltip as the user scrolls so they can see what will be loaded when they stop. Remember, the user only needs the information that they can see. If you can deliver visible information quickly you will have the opportunity to download the details later in the process when the user doesn’t mind a small delay.

This concludes the current series of discussions on large data sets. The topic will continue to come up regularly since I focus on the data side of Flex, but I do think about other things and would like to discuss them as well. Hope you’ll stay with me!