Data Structures: a rant

(If you’re curious, the textbook from this class–and the source of the explanatory links I’m using in this post–is available online free here:

– – –

I just got out of another Data Structures class. I zoned out halfway through, which is somewhat unusual for me. Happens more often in Data Structures, because the class starts at 8AM. (Fortunately, I think I found a less winding route to cut down on my time walking to class–hopefully the shortcut will make me less prone to lateness. I am not a morning person. I tend to dream about waking up and going to class. I’ve even slept through the 10AM labs, which is ridiculous and frustrating–but also kind of understandable, because they’re on the day after the 8AM class, and my brain is trying to get rid of its sleep debt.)

But that’s not really why I started tuning out the professor’s lecture today. It was the result of another impractical implementation on the projector, at which point a thought struck me:

If these data structures are so useful, why aren’t they part of the language already?

Python (which is what we’re using, 3.5 at that) is not a dinosaur, and most of the math for these data structures was done like fifty years ago. I’d understand if the point was to get C to be more efficient, but most people don’t even code in C any more. It’s not the right tool for very many of the jobs we have to do.

In fact, this class seems closer to a language design course than something practical to software development. I know that’s the point of the high-flown “computer science” department, but… come on. Even interviewers are getting the idea that this kind of question doesn’t matter that much in the real world. It seems more to me like something to be learned on the fly, when needed. Which in practice would probably just mean memorizing Cracking the Coding Interview because you need a job.

Why does this theory class have to be so… theoretical?

I wish the teacher would spend some time telling us when to use these structures, rather than just what they are and how they’re implemented. Otherwise I think this course may do more harm than good.

Why in the world would you ever use a singly-linked list? Most languages–and especially the ones most commonly in use–have array or list structures of their own, which are 1) optimized for you, 2) don’t require extra code, 3) have been tested far more thoroughly than your code ever will be and are thus more stable and predictable, and 4) don’t confuse the hell out of other people who come in and read your program!

Why in the world would you bother implementing a doubly-linked list either, unless you’re coding a programming language of your own?! None of this makes sense! We have a dedicated class for language design! We have a dedicated class for C programming! Why isn’t it in there instead?!

Ahem. *deep breath*

Hash tables are kind of cool, and so are binary heaps (although they’re less practical). My affection for their clever hackitude is rather stifled by the suspicion that, again, they’d probably be built-in structures if they were that useful. Like Python dicts–those are hash tables behind the scenes. Use those. Everyone knows what they are and you don’t have to code them yourself.

If you’re working with large amounts of data, hash tables and binary heaps could be useful. But the professor doesn’t talk about when, just how to use them. If you’re not working with big data, chances are you just need a dict or list, and can spare yourself and others the experience of trying to interpret your weird code later.

But the professor doesn’t talk about that kind of thing. I wonder if he’s getting homework assignments that use this stuff unnecessarily. He hasn’t been taking points off mine because my work still does what the spec asks for, as clearly/briefly/user-friendly-ly as possible. I haven’t been looking for ways one might use weird data structures in the assignments, so I don’t know if they’re designed to invite us to use them or what.

He also covered recursion, which I think is very useful–again, if you know when to use it, which depends both on the problem you’re solving and the language you’re using. I will casually use recursion to make my code cleaner-looking even if it isn’t always the most efficient option (in Python). But that’s for readability. Mostly I use it when I’m getting user input, and I stick the prompt in its own method and loop if I get bad input. That lets me put all the lengthy, messy error-checking off somewhere and the main program will get back a good value.

I think this practice is supposed to be kind of evil according to functional programming, because user input functions aren’t pure functions? Or maybe, because I’ve sectioned them off and make sure they return good values, they’re exactly what you’re supposed to do in functional programming. Don’t know yet. It works, anyway.

I still use “else:” almost exclusively for “Can’t Happen” errors. It saves me headaches when I screw up the recursion. That’s a minor change with using recursion, but it’s an issue only while writing; I’ll put in extra effort when I’m writing the program to make sure it’s easier to read later.

What I will not do is write code whose purpose is to make me look or feel clever for writing it a certain way. That’s insane. Sacrificing readability, even efficiency because you think it’s cooler to use some homebrew data structure code rather than a freaking built-in Python list? That’s absolutely insane.

When interviewers select for people who know how to do this, I wonder if they realize they may be selecting for a subset which includes the actual worst candidates: the smug “Of course I know everything, if you lesser mortals don’t, that’s your problem, Google it, and also I have very strong opinions about i++ vs. ++i and will totally correct you if you’re wrong.” Fools and incompetents may eventually learn. This person doesn’t think they have to.

(Of course, sometimes a person like that is useful to look impressive in customer meetings, but in my admittedly limited experience, they’re just as likely to insult your customers as to impress them. Ever seen a bunch of startups pitch? Some of those folks need an attitude adjustment. Of course, they’re going to fail; it’s hard to have enough empathy with customers to produce a good product if you have such scorn and disdain for their intelligence. Then they’ll claim they went under because they “failed to pivot.”)

Oh, one more thing. Big-O notation is pretty useful. Even if your main use for it is understanding what other people are talking about, and/or making yourself look impressive. It’s so you can find out which of two algorithms is more efficient, which will probably be something you’ll need to do eventually. It looks way more complicated than it is.

I could very well be wrong about this, and I’d be pleased to find out if I were. I’d love to find something that’d make me a better programmer. I’d be terribly pleased to find out I wasn’t learning something nearly useless. But I don’t think that’s the case.

The CS math course I have (enigmatically titled “Discrete Structures”… I have yet to find out why, unless the reason is “because it sounds fancy”) seems a little more practical at least. Logical thinking comes pretty naturally to me, but I know it doesn’t to everyone, so it makes sense to cover it. I know there’s also set theory, graph theory, and combinatorics later on in the course, and I’ve heard those are good things to know for programming. I guess I’ll find out.

Oddly enough, it’s my English course that’s the most practical. The professor decided to forego the traditional giant clunky writing textbook and told us to order this list of like eight different short story collections, which we read and then write and talk about. That takes up about half our class periods. The other half (each day is dedicated to one or the other) is the professor talking about writing: elements of stories in general that are important (like what the stakes are–is war at risk, for instance?), writing style (like sentence structure, word choice, voice, POV), what it means to have universal appeal, what it means to capture an audience and how to get their attention–and especially the skills of critical thinking and backing up your claims.

This format is a little odd for a class called “College Writing and Research”–but it isn’t bad. I think it’s more effective than the normal route. No writer I know learned writing out of a big clunky textbook, or by doing essays on symbolism in Dickens or whether school uniforms are a good idea.

Since CS folks often point and laugh at the liberal arts, saying their study is useless bull and all those English majors should be studying something useful and practical (like computer science of course), well… I have an uncomfortable piece of news.

Not that I’m saying majoring in English is more likely to get you a job (although it’s better off than, say, Gender Studies). But knowing how to write effectively, which my English class will teach you, is far more practical than knowing how to invert a binary tree.

Sorry, prof.

Please read the comments.


9 thoughts on “Data Structures: a rant

  1. Okay, I’m gonna need to disagree with you.

    Yes, languages have a bunch of data structures built in, and those are generally more convenient to use, if you *know* you’re only dealing with small datasets. But I would argue that you need to know what those are behind the scenes.

    What is a built-in list, in Python? Is it a dynamic array or a linked list? The difference is going to seriously matter when you’re choosing how to store that massive dataset that you’re updating and deleting all the time, as opposed to that other massive dataset which you only have to declare once, but then want to search through a bunch.

    A linked list, for example. is a very useful thing to have if you want to maintain some kind of queue – like requests to a website, or processes to run on servers. It’s also often much faster to move around pointers to objects than the objects themselves, which is what linked lists do and what arrays do not. (Why this isn’t in your C class is another question, and not one I can answer, sadly 😛 )

    They are also a very good lead-in to thinking about trees, which are astonishingly useful any time you’re dealing with a large dataset. Or if you want to deal intuitively with data which has some kind of natural hierarchy – true a lot of the time. Particularly true for a lot of interesting problems, like machine translation or computer vision or source-code parsing or…

    You know what, when it comes to “use of CS theory in real-world situations”, I’m going to redirect to Steve Yegge, specifically – puts it much better than I could, and with a lot more real-world experience.

    Sacrificing readability? Encapsulate your effective data structure in an object or a function with a neat, comprehensible name, and refer to that.

    I think what it comes down to is your claim that built-in language constructs are optimized for you. But optimized for what? For access time or efficient memory usage? For insertion and deletion, or for indexing? These things are often mutually exclusive, and you need to know what you need behind the scenes, and what it is you’re actually dealing with. Otherwise you try to sift through a client database and your program hangs a million years. Totally not speaking from experience.

    It’s true, you might not be dealing with big data now, but 1) you don’t know how your input is going to scale in the future, 2) f someone can crash your software by feeding it big enough inputs, that’s potentially a dangerous exploit, and it’s much harder to predict all ways this can happen than just picking a good data structure to start with and 3) I mean, good habits are good.

    *deep breath* THAT SAID. That looks like a terrible textbook. It’s dry and, as you suggest, devoid of real-world context. I’m part way through The Algorithm Design Manual – pdf here – which goes through some fundamental algorithms and data structures with regular references to “War Stories”, real life situations where the algorithms were actually used. I’m very much enjoying it – highly recommend.

    (Sorry. I can do short and non-opinionated comments, I promise.)

    Liked by 1 person

    • YES!!! Thank you, you answered my question. Finally, someone has some sense.

      Again, I was totally prepared to be wrong about this. But aaaarrrggh. Can we not have some context, please?

      I will definitely look at that book you linked.

      I need to remember that this school, while leagues better than community college, is still probably kind of dumb about how it approaches some things–and the dumbness may not belong to the things themselves.


    • I still think that these structures should be part of the language, as a well-documented “big data” library that can be called upon when necessary and left out when it’s not, and that learning the API should be something that a programmer can pick up when they need to, just like we pick up programming languages all the time. The best two or three implementations of each structure (according to different optimization priorities), marked as such, in a library–so we don’t need to spend so much time trying to learn how to implement them.

      This is my bias for self-teaching showing up again, by the way.

      I also think that the prominence of this material in the program is a direct result of math envy: the CS department trying to look more prestigious. A programmer today is at least as likely to be working on mobile and/or web platform applications as they are to work in a problem domain where this material is useful, and in some schools those platforms–which have inherent design and coding peculiarities that aren’t necessarily obvious–aren’t covered at all. (Believe me, I visited several while college hunting, and they were quite proud of their “engineering focus.” I didn’t apply.)

      Some mobile and web applications–particularly the larger, more lucrative ones–are heavily involved with big data, yes. But it’s just an area of software development, not necessarily a more or less important one, and its tactics are not to be applied everywhere.

      I suppose my argument now boils down to the fact that I think CS should be more software development focused, or that there should be a separate major for software dev.

      There’s got to be a better way to do this. Maybe it’s just the textbook that’s making me mad. I’ll check yours out.


      • (Sorry about delayed reply, it’s been a weird week.)

        Re: these data structures should be part of the language, I mean, often they are. For languages like Python, at least, which is one of the reasons I am *so baffled* that your school seems to have chosen to write this course in Python.

        But yeah, you’d mentioned dictionaries, which are hashtables behind the scenes. Sets too. And Python has arrays, and lists which I *think* are dynamic arrays – not sure it has linkedlists, but there are libraries with queues and such which can be used for similar things.

        Honestly it would make much more sense if this course was being taught in a more minimal language like C, which requires you to do this kind of low-level stuff yourself more of the time.

        That said, I don’t know whether I’d prefer a life where everything is APIs. Something ESR wrote somewhere about Java stuck with me – something like how it approached programming like a plumber with a box of spare parts. I’m heavily for self-teaching as well – my formal CS education is pretty much non-existent – but I guess I like knowing more of what’s going on. I don’t know, maybe this is just snobbery.

        As for it not being relevant to most people – maybe? It’s maybe a little unambitious to assume that your mobile or web app will never be required to scale, but that’s a different matter 😛 It kind of comes down to basic application-programming / working with APIs and so on being something that most smart, motivated people *can* self-teach. As you say, there are quirks, but often it seems to just be about knowing the magic words to say to the API. If software dev was always about that kind of thing, as you’ve said in previous posts, no-one would need to go to university for CS, except maybe to meet people.

        But if you ever wanted to get into trickier technical stuff – the kind of thing that maybe requires you to write your own APIs, even – I think that’s when the theoretical classes become more useful.

        As a sidenote: your professor might surprise you, if you put in a request for more context or real-world examples. Do you have any way of giving feedback on the course, anonymous or otherwise?


      • They teach in Python because the intro class is in Python, and this class comes early in their curriculum. Remember, I’ve had two years of college–but it’s been an entirely different set of courses that doesn’t transfer well, so I’m still pretty early in their set.

        I can sort of understand not making the jump right to C from Python–I think they’re trying to encourage people into the major–but why not pick a language with libraries for the functions already? You’d have code examples that are better than the ones in this stupid book, each of which the professor has to rewrite sensibly. I don’t think we’ve come across a textbook example he hasn’t rewritten because it did some stupid thing.

        I don’t see why this is something you couldn’t self-teach. Lots of APIs are more complicated than this, I think. It’s not about assuming you won’t have to scale, but knowing you have the ability to adapt.

        I guess that comes back to an unspoken question: what is hard to learn on your own, enough that it needs to be taught? And I’m not pleased with that question, because I’m still secretly wondering about how much formal schooling is worth, at least to me. (I hesitate even to write that. I’m staying for now, anyway.)

        That’s the art, I suppose. Everyone has their own set of dumb quirks that need to be refined so they’re not making the same stupid mistakes. But to teach people that might even require an entirely different format than is found in universities.

        So tackling this topic is probably as good as anything else they could teach. (Man. It’s starting to sound like I’m complaining that this isn’t more difficult, isn’t it?) I still wish it was a little more down-to-earth. Not sure I’d poke the professor about it. Maybe if I see a good way to.

        Maybe they’re building up to something else that’ll use it that way–maybe this is like those math classes that are so boring in high school because they’re a basis for interesting, useful stuff later. I can’t tell yet.


  2. Huh. Comment-reply nesting in wordpress doesn’t seem to work past the above level.

    It is something you can self-teach, I think. It’s probably a bit harder to self-teach, and maybe less likely for someone to self-teach.

    Main differences between something like data structures and APIs: data structures probably more universal, so you don’t have to teach people to solve the same problems again and again in different ways. Data structures might also be trickier for some people, since seeing why they’re useful is easier if you have a bit of a maths background.

    And it’s probably easier to see why you need to learn an API (solves the problem in front of you) than why you might need to learn data structures (solve generic problems, sometimes in non-obvious ways, without requiring you to reinvent the wheel). Might mean fewer people would think to self-teach something like this, meaning universities need to provide motivation *and* means to learn.

    Wondering what’s hard to self-teach is a good question, I think. Ideally, a university would give you context on what’s worth learning, and then you can go figure out the details for yourself. Sounds like your course isn’t doing a fantastic job on any of these counts, though. 😛


  3. After reading the paragraph about Linked Lists I’d like to interject.

    In my opinion those courses have the goal to teach those CONCEPTS AND understand those Datastructures (incl. their Time Complexity & Space Complexity (& Locality) …) so that you get a feel for (designing with) them.

    e.g.: No dynamic Memory allocation? -> Allocate a big chunk of memory and roll your own (also has the nice side effect of introducing determinism & side-stepping allocation-wait-times & garbage collection).

    Algorithms used in games ( ) are partly VERY similar to example problems given in Datastructures courses and illustrate their use.

    If I remember right a singly linked list is sometimes used to iterate fast over small particles in particle effects.


  4. (Mutable) linked lists are one way to implement queues. In practice, they were much more useful in the 70s than they are now – memory accesses used to take a couple cycles, now they take several hundred CPU cycles and cache efficiency is king. So linked lists are much more likely to be performance hogs and may not actually be the most efficient option for implementing queues. You can implement a deque with good ammortized but bad worst case performance using a dynamic array and this is often a better option. Linked lists often end in performance disasters when you misuse them, so be careful when using them.

    Their main use (as immutable singly linked lists, often also either lazy or built with tail recusive modulo cons functions) in functional languages would be for control flow, not for storing data in. They are a nice pure functional alternative to generators (like python’s yield) in imperative languages. Don’t use them if you ever need to peek further down than the second element on each step of your algorithm though. That usually leads to slow algorithms.

    Heaps are useful when you need a heap or need to minimize something. Djkstra’s algorithm is a very good example. They are crucial for pathfinding problems or schedulers, and often useful for NP-complete problems with heuristics. If you simply can’t figure out how to make an algorithm fast, often a heap and some kind of greedy heuristic strategy is what you’re looking for.

    Arrays (usually mutable) are good when you need random access. The sieve of Eratosthenes is a great example of how mutable arrays can completely destroy linked lists in terms of asymptotic performance for some problems. They are also typically more suited to imperative programming than functional programming, and is the main reason why you’d sometimes want to avoid FP for performance reasons.

    Trees are also very useful, when you need random access but also need random insertion. You lose cache efficiency/loclaity of reference relative to arrays, but you gain a lot of flexibility. Lisp code is a tree of cons cells (well, Clojure does try to be different here). Pure functional programs often use trees as a reasonably good substitute for arrays. Clojure’s vectors are trees behind the scenes (and transient vectors are arrays iirc).

    Hashmaps are great, and fairly self-explanatory. You’ve used Clojure maps and python dicts, so you’re used to them. They are very versatile for general programming, especially in dynamically typed languages where they tend to replace structs to some extent. You can also use them to do cool stuff by putting lambdas in them. Add a few keywords, and you might get objects from this…

    As far as algorithms go, hashmaps are the second most important “fast” data structure with decent locality of reference after arrays. They are awesome for dynamic algorithms. They are great for counting or remembering things/solutions to subproblems.

    Divide and conquer, in order to live long and prosper.


    • Also, if you want to up your functional programming game, “purely functional data structures” by Chris Okasaki is a must-read. It’ll teach you how to write functional code that isn’t slow, and why laziness is amazing.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s