Large Data Sets (Part 2 of 3?)

Our second topic for dealing with large data sets is sorting. The problem with sorting is you need to have all of the data available in order to produce an accurate sort. However, as we saw in the last article bringing down all of the data from server to client is incredibly expensive and therefore unrealistic. Even normal web applications rarely load all of their data from the database server to the app server when dealing with a large amount.

Instead of downloading all of the data to the client we’ll let the database take care of sorting for us (which is one of the things a database is very good at). Let’s build on our examples from the previous article and add sorting capability to the implicit paging solution. The examples can be found here and are a superset of what we did in article 1 (meaning that the old examples are there too).


Sorting

The first part to adding sorting support is in our data service. Remember from the last example we had a ValueListService which was similar to the ValueListHandler. I’ve now created a subinterface called SortedValueListService. The first method we care about is an extension of getElements called getSortedElements. We’ll keep this method simple and only allow sorting on a single field plus a boolean indicating whether the sort should be in ascending or descending order. If you were to build on this you might change the sortField into some form of sort descriptor.

I’ve implemented this method in the CensusService and added the appropriate method to the CensusDAO. The CensusDAO is now much more closely tied to the MS SQL Server database as it uses a stored procedure to retrieve the data as we need to keep track of row numbers which requires use of temporary tables. The same would not be true in Oracle which supports rownum inherently and I’m not sure about MySQL but I’d bet it’s not hard there either. My stored procedure is based on info I found on the 4GuysFromRolla.com site. The premise is the same as the normal getElements SQL, the only difference is that we need to throw sorting in and therefore can’t rely on the ID being the same as the row number.

Now I need to update my Flex code to take advantage of sorting. Most of my changes to support sorting go in the SimpleDataProvider. I’ve added an implementation of the sortItemsBy method which is specified by the DataProvider "interface" but did not implement the sortItems method as that uses a comparator function (which means needing all of the data on the client). I then assume that the data provider will always be retrieving sorted data, so I changed it to always call the getSortedElements method instead of getElements. The SDP then maintains the sortField and sortAscending boolean to pass to the server whenever a new page is needed. When the sort changes we first check to see if all the data is available in which case the sort can occur on the client. However, that rarely happens so we essentially need to start over retrieving data from the server. Therefore all we need to do is set sortField and sortAscending variables, then clear the list and let the DataGrid simply ask for the correct data. You can see this implementation in Example 4.

Maintaining the current selection

If you played with the example you may have noticed that when you change the sorting options the DataGrid maintains its vertical scroll position but does not maintain the selection. This is the default DataGrid behavior and is fine for small amounts of data but might be frustrating if you’re looking to use the sort to group similar items near each other starting from your selected item. Example 5 is one attempt at addressing this issue (and I’m not guaranteeing that it’s the best). Another method is added to SortedValueListService called getSortedPosition which given the same search criteria as getSortedElements will tell you the 0-based position of the element you pass in. This requires another stored procedure in the database for MS SQL Server but would be similarly straightforward on another database.

Integrating this change into the Flex code is a little more complicated as I didn’t want to corrupt the DataProvider itself with the concept of a selected item, as that is dependent on the view into that data. So what I decided to do is control the sorting operations myself instead of letting the DataGrid manage it. Now when the header is pressed I keep track of what the intended sort is, and then go ahead and get the index of the selectedItem and find out what its sorted position would be (this is all before actually modifying the DataProvider). Finding out the new sorted position requires a server call so it’s possible that you’d want to use the busy cursor or some other mechanism to indicate that things are happening, but I didn’t in the example. Once the new sorted position is known for the selected item we can go ahead and sort the list. Next we scroll the DataGrid to where the selectedItem should show up, and finally once that page has loaded we select the actual item. The reason we need to wait for the page to load before setting selectedIndex is that the DataGrid will clear the selection if it changes the underlying data. If you put a trace in the handleLoad method you’ll see that a few pages are loaded after a sort even though they aren’t the pages that should be visible. I think I’d probably need to change the DataGrid class to avoid this so we’ll ignore that for now despite its minor performance impact. If we were going to generalize this I think we’d want to subclass the DataGrid so that it knows it’s working with the pageable data provider along and give it a reference to an object that can provide certain information like the sorted position of an element.

Conclusion

So that’s the basic approach for sorting large data sets. If anyone has had experience that might improve upon these by all means let me know. In the next article I’ll write some things we can do to improve the perceived performance when dealing with large data, and then I think we’ll try to move on to other topics.

18 Responses to Large Data Sets (Part 2 of 3?)

  1. Hans Omli says:

    I’ve seen this done in a similar fashion with explicit paging, but hadn’t seen it done well with implicit paging. Nice work!I’d be very interested if someone works out the issues with maintaining the current selection.

  2. JesterXL says:

    Just in idea; not sure if Flex exposes an easy way to get at onEnterFrame, but you could create a temporary loop to sort manually that way, however, she’d be dog slow.The other way is to run an interval at about 100 milliseconds, and sort that way. The issue is your dupin’ your array/dataset just to created a sorted one, but at least you have a faux “background process” doing it. In cases where the user is connected to the backend with a decent connection, and that connection is reliable, this is by no means any worth as paging, even in my limited expierience, is a lot faster. However, if it’s one of many large datasets, and your perhaps saving locally in a SharedObject, this background parsing helps to keep what you have locally sorted without spazzing the client out. Either create a new array/dataset, and us an onEnterFrame/interval to create such.

  3. Matt says:

    Hey Jesse,I agree that background processing is a useful thing to do and an onEnterFrame handler might be the way to do it (we don’t really expose it but we do have the doLater command which could be used for the same purpose). However Player 7’s sorting has gotten pretty fast so I’m not sure there’s a real need spread sorting of the data you have across multiple frames (unless the compareFunction is really crazy in which case if you do spread it across multiple frames you’ll need to write the sort algorithm yourself as well). The problem with sorting just the data you have locally is that you’d then need to insert the data you bring down from the server into that sorted list, meaning you can’t just replace a whole chunk of data but have to individually place each piece. I think the user-experience wouldn’t be great here as data would fill in randomly, possibly moving the current item.Though I might be missing your point here.Matt

  4. nig says:

    Hmm.. Selection is *supposed* to be maintained after a sort in the dataGrid.. It does get cleared if you’re resetting the DP (but I don’t see that in the example). Usually if sorting breaks selection, it means that there’s something wrong in the dataProvider’s getItemID(). The grid uses the item’s ID as a key into its selection hash.The getItemID here looks ok (and familiar), but it looks like the clear() method is destroying each item in the DP. Unfortunately, the ID of each item is stored on the item itself (very stealthily), so by destroying each item, you lose those original IDs, and selection.So, in order to maintain selection, you’d have to have those object IDs persist between clears. A getObjectID that relied on a stronger mapping of Unique IDs to data content would work.. (ie, if your query returns P_IDs of the records, return those in your getItemID()).In the end, I think your answer is probably simpler (although I apologize for forcing anyone to have to use sortOnHeaderRelease).cool topic! Paging dataProviders are something users really need to learn about. Here’s my suggested topic :I was working with a paged app the other day and realized as well that by the time I’d scrolled all the way through the grid, my RAM usage had surged to like 150MB. I realized that although paging had kept my download manageable, at some point I needed to *kill* records from memory when they weren’t needed. Any thoughts on this subject?again, good stuff.nig

  5. Paged DataProviders over at Matt’s Place

    If you’re doing DataGrid work in Flash with tons of data, sometimes dataProviders can just take too long to load. The cool thing here is that the DataProvider API is abstract enough that you can write your own dataProviders…

  6. Matt says:

    Hey Nig,You’re right that the clear is killing data since we’re essentially re-downloading the data from the server. I suppose I could go ahead and implement getID on the CensusEntryVO and that would perhaps avoid the issue. Plus doing that on the VO means that the DataProvider doesn’t need to change its getItemID logic.Killing old pages is a good idea for my next article which will be overall performance considerations.Matt

  7. JesterXL says:

    Der… I figured at that point you had the data already. If you do, the interval the mofo, otherwise page if you don’t have all the data ( to avoid the randomness… didn’t think about it)….course, 7 being fast, even with 6, to me, if your having issues, your doing something wrong. Either your loading too much data, or you problem can be re-factored. I guess there are those that need that much data, though, so if your in a bind, I can see the need… well, ideally at least.

  8. Max says:

    Hmm…I can feel the smell of a new DS/DG component on the next devnet cd

  9. Ross says:

    Matt,This is “off topic” but I can’t seem to find anyone with an answer.How can we call Functions that we have defined in Flex from Javascript in the Browser?Seems like a pretty straight forward requirement, but I can’t seem to find anything to help.Thanks,Ross

  10. Matt says:

    This is the kind of question for the flexcoders list or the support forums. I’d like the blog to stay on topic if possible.JavaScript/Flash communication is possible but there are some limits. You can really only use JavaScript to set variables in your swf, however those variables can be setters which could in turn execute code. Check out SetVariable in the Flash docs. I believe people have started to address this on the forums too.

  11. Ross says:

    Matt,Thanks for the post. The “really only use Javascript to set variables…” was enough for me to get going.It seems to me that as Macromedia positions Flex for developers like myself, who are only now becomming interested Flash as a consequence of Flex, the client-side plumbing should be seriously considered. We have looked at Lazlo, DreamFactory etc. and they fall short in this respect also.Things like supporting an id attribute when rendering Flash from Flex so we don’t have to fuss around with HTML wrappers that preculude using .jsp pages to generate the Flash from mxml etc. without clumsy client-side Javascript to assign an id are important issues.Registering for events, performing function calls etc. across the Browser -> Flash Player -> Flash Movie interfaces will help ensure Flex success in the Enterprise. We cannot simply move everything to Flex, it has to integrate tightly with pre-existing applications and infrastructure both server-side and client-side.That said, Flex is cool and I am working hard to evaluate if we will be able to take advantage of it.Thanks for the help. I’ll keep the next post on-topic.

  12. carl says:

    i think we need a example for flash!cant any of the experts help us poor twisted newbies to find a start point on using paged datagrid results ???

  13. Peter Ent says:

    What about charting data in Flex? You would need all of the data to do a chart properly. In financial services we often look at several years’ worth of data as a chart of some sort. The database can certainly pre-sort anything, but you still need allof the points – or else you might miss a spike here or there.

  14. Matt says:

    You’re right, a Chart might require all of the data at once. I’m not talking about that right now but as we see more and more charting apps perhaps I’ll try to think about some strategies.

  15. Prismix Blog says:

    MySQL version of Matt Chotin’s Large Data Sets SQL (Part 1 of 2)

    Matt Chotin wrote three really great articles on managing large data sets in Flex. Part 2 of his series used MSSQL stored procedures. As we are using MySQL I decided to figure out the SQL for MySQL. Below is the…

  16. Prismix Blog says:

    MySQL version of Matt Chotin’s Large Data Sets SQL (Part 1 of 2)

    Matt Chotin wrote three really great articles on managing large data sets in Flex. Part 2 of his series used MSSQL stored procedures. As we are using MySQL I decided to figure out the SQL for MySQL. Below is the…

  17. Prismix Blog says:

    MySQL version of Matt Chotin’s Large Data Sets SQL (Part 1 of 2)

    Matt Chotin wrote three really great articles on managing large data sets in Flex. Part 2 of his series used MSSQL stored procedures. As we are using MySQL I decided to figure out the SQL for MySQL. Below is the…

  18. Prismix Blog says:

    MySQL version of Matt Chotin’s Large Data Sets SQL (Part 2 of 2)

    Yesterday I posted an entry on converting the SQL used in Matt Chotin’s second article on Large Data Sets in Flex to work with MySQL. Today I post the second lot of SQL. Enjoy! Each block (from yesterday and today)…