Nick Bastin

Nick Bastin header image 1

It’s coming…..!

August 24th, 2009 · No Comments

I finally broke down and ordered a Das Keyboard Professional. It has had several good mentions over at StackOverflow and I have had zero luck walking down the aisle of keyboards at MicroCenter/BestBuy/CircuitCity and finding anything that felt better than plastic depressing into Jello. The standard Apple keyboard on my Mac Pro has good action, but the keys have ridiculously short throw – more like modern laptop keyboards. So, hopefully it’ll arrive on wednesday and I won’t have an angry rant about how much it sucks…

→ No CommentsTags: Hardware · Software Development

PyOhio from afar…

July 26th, 2009 · No Comments

Sadly I have the flu, so I haven’t been able to go to PyOhio. Word on the mailing list is that things are going well, and I’m disappointed that I’m not there sprinting, speaking, and generally hanging out with other fun python developers, but infecting other people is regarded as bad form.. :-/

I’ve spent the last few days hacking around on my android phone and I’ll probably have some more to say on that in the near future. At the moment, my only solid impression is that the emulator is quite slow on MacOS X – perhaps this is true on all platforms – almost disappointingly so, and using Eclipse makes me cry.

→ No CommentsTags: Python · Software

Why collections.deque is my new favorite data structure

July 15th, 2009 · No Comments

In the process of unrolling some recursive algorithms to be iterative, I’ve started using collections.deque, and I’ve decided it’s my new favorite general-purpose python data structure. This wasn’t a hard decision – lists are the general utility collection type and they suck, so really anything ordered and mutable was probably going to be better (I almost never need the rotational capabilities of deques, so a somewhat lighter structure would actually be better, like a doubly linked list).

Here’s the fundamental problem: python lists aren’t lists by any standard computer science definition of the word – they’re arrays. I think this should be a surprise to people who don’t know anything about Python. There are at least a few core python developers who have tried to convince me that making the “list” data structure be an array “is just easier for newbies” (along with some rather condescending comments about how it’s obvious that it’s an array since it supports direct indexing). This is complete horseshit. If someone is new to programming, they really don’t care if you call it an array or a vector or a list – they don’t understand what any of those things are. In fact, calling it a list calls up connotations from real life that are not generally accurate, so best not to confuse them anyhow, in my opinion.

You might just think this is a stupid arguments about semantics, but it has much larger implications towards when you would use a given data structure (and choosing the right data structure for a given job is something that most programmers, and certainly new ones, do very poorly – lets not design a language that encourages that practice, eh?). If a list were really a linked list, you would probably take away direct element access (list[35]) – although you might not, there are ways to make some common cases of direct element access reasonably efficient in linked lists – but you’d leave the API otherwise the same. I’m not really sure who this would really hurt (and of course you’d be free to implement vector() or array() to solve that specific problem), since the most common list use case is iteration over some or all of the objects (a cursory examination of the standard library indicates this is true, but I don’t have hard numbers). However, the list API already offers some truly inefficient members – ones which I think might have more common use (at least in my experience) than direct element access. Notably, that member is list.insert(idx, data) – it allows you to insert anywhere you’d like in the list, although in practicality one of the most common places to insert is at position zero. This is incredibly inefficient.

Of course, if you want to always insert on the top of the list then it can be argued that you should just use list.append() – which is very fast – and reverse() or pop off the bottom of list. The problem is when you want to be adding elements to both ends. List is a VERY bad data structure for double-ended operation, especially if you have a lot of items. list.insert() can cause some serious memmove() action when your lists get bigger than 10000 items (more data below). Of course, for that matter even append() can cause serious problems when your collections get to be more than a couple million elements – you can very easily fail to allocate a few megabytes due to memory fragmentation even when you have plenty of memory free, since arrays have to be contiguous blocks. So, I would call that something other than a list, because it very much behaves differently, and bring it in line with other programming texts that people might pick up. You can still make it your primary data structure if you like, but please, please, call it an array or a vector, and write about performance guarantees in the python documentation (for all data types).

So anyhow, because Python doesn’t actually have an efficient doubly-linked list collection, you’re stuck with deque. The implementation is efficient, though, so it’s plenty serviceable as a doubly-linked list if you need a double-ended data structure with efficient insertion at both ends or need to store millions of rows (deques are allocated in blocks, but those blocks can reside anywhere on the heap, making reallocation a relative non-issue).

I made a few tests that mimic my use case and I’ve put the results below. The test is the same code for both lists and deques – just the container type is changed (and list.insert(0, data) is replaced with deque.appendleft(data)). The test creates an empty structure and inserts 100000 elements into it, alternating every other element between appending and inserting on the top. Timings are from the test loop run 10 times, each test is then run 3 times.

Deque: 0.46s, 0.47s, 0.47s
List: 48.86s, 48.22s, 49.62s

I had originally put a million items into each collection, but the list test did not want to complete in anything resembling reasonable time, even alternating between append and insert (insert(0, data) on a 10000 item vector 10 times takes around a second, while using a 100000 item vector takes around 100 seconds). list.append() is slightly faster than deque.append(), but it’s only a couple % (probably due to the efficiency of having a LIST_APPEND opcode), so I continue to use deques anytime I have any kind of medium-to-large data set, even if I’m just appending and iterating.

→ No CommentsTags: Python · Rants · Software

PyOhio Hotel Info

July 11th, 2009 · No Comments

Catherine has arranged for a room block at the Blackwell Inn (which is right across from the conference location) for $109 a night, so grab rooms while they’re hot…

We still haven’t nailed down a specific location for the sprints yet, but they’ll either be at the hotel or at Knowlton (the main conference location). I’ll update the wiki soon to put the sprints on the main conference schedule, but the intended times are sometime friday afternoon/evening (depending on when people start to trickle in – I’ll likely be there mid-afternoon), and saturday evening after the main program is complete.

→ No CommentsTags: Python

PyOhio Sprints

June 26th, 2009 · No Comments

I’ve accepted the rather loosely-defined job of PyOhio Sprints Coordinator, and I’ll be sending out an email soon outlining at least a skeleton of a plan and looking for volunteers to herd cats the day-of for individual sprints, as well as for sprint ideas. Because the event is likely to be relatively small, I’d like to open up sprints to remote participants, so I’ll be thinking about how to do that as well. My current thoughts are that we at least need a Jabber server with a “room” per sprint topic (XMPP is so much easier to deal with than IRC these days), some sort of remote display capability (shared VNC, gotomypc.com, webex, etc.) so that someone can show demos for everyone, a development wiki, and probably a local copy of most source control tools.

→ No CommentsTags: Python

Gmail has jumped the shark again

June 19th, 2009 · No Comments

Every once in a while gmail seems to forget to execute the “Skip the Inbox” step of my filters. This is almost certainly guaranteed to happen to a few messages every week (they get labelled, but not archived), but tonight it happened to about 100 messages and flooded my inbox, a real treat. Anyone else having this problem? Any way to fix it? Hopefully google isn’t doing anything else bad to my mail?

→ No CommentsTags: Software

Roundup…

June 15th, 2009 · No Comments

Yes yes, I’m back to my non-posting ways. Woe is me.

The good news (sortof…) is that both of my talks were accepted for PyOhio. The bad news, of course, is that now I have to write them both, but I suppose it’s not nice to complain about an embarrassment of riches…

I still have yet to really poke around the Android SDK, but my experience with the G1 thus far has been mixed, with battery life still a major concern. I wonder how long it will be before tools to optimize the power impact of your application will added to the stable next to profilers and network optimization tools. I haven’t done a lot of research into the topic, but empirical evidence suggests that Android could benefit greatly from a centralized data sync service, where applications merely request to be sync’d the next time the service ticks, so that you don’t have your data connection fired up every 5 minutes, even though you told every application to sync every hour (because of course all of their hours are aligned differently). It would also solve the annoying problem of not being able to tell some applications how often to sync.

→ No CommentsTags: Hardware · Python · Software

Unicode or bust

May 28th, 2009 · No Comments

While this was mostly already true, I’m making it official and trying harder. From now on, every document I create and message I send will be unicode and unicode only. If your mail client or text editor can’t read it, try upgrading and joining the rest of us in the 21st century. It’s really ridiculous at this point that we’re still dealing with crazy codepage and encoding nonsense.

→ No CommentsTags: Rants · Software

PyOhio Proposals

May 15th, 2009 · No Comments

As per a note from David Stanek on the organizers list, I managed to get two “summary” proposals in under the wire tonight, for which I can finish putting abstracts together this weekend, hopefully. One is a somewhat flexible idea for a presentation on extending and embedding – could be anywhere from a somewhat lecture-like presentation of the pros and cons to a slightly tutorial-like nuts and bolts presentation – although 40 minutes is mostly just enough to whet someone’s appetite, but that’s why everyone hangs around to chat at conferences and help with individual experiences. :-)

The other pitch is a session on using Amazon Web Services in your Python applications. There seems to be a lack of coverage of this on the Amazon forums and internet in general, while it’s a super useful tool for being able to abstract your application from any real network resource reliability problems. I have specific experience using EC2 and S3, as well as enhancing the Amazon contributed libraries (the default S3 one in particular is pretty lousy from a user experience standpoint), and will probably do some research into SQS, SimpleDB, and EMR so that I can intelligently touch on the basics of those topics at the conference.

→ No CommentsTags: Python · Software

The idiocy of….idiots.

May 15th, 2009 · No Comments

Look, site hackers are generally bad people. I don’t think we need to have much discussion on this topic – at best, they’re up to no good, and at worst they’re truly malicious. However, some of the quotes in this story quite miss the point:

http://news.bbc.co.uk/2/hi/technology/8049780.stm

I have exactly zero sympathy for these people. What should have been a minor annoyance ends up being the destruction of over a decade worth of community work because someone was too stupid to create a real backup plan. Let this be a lesson to all of you community-minded individuals out there…don’t assume that the people who run these sites have a brain in their head.

→ No CommentsTags: Rants