Don’t Go Hunting Tentacled Beasts Alone Under the Cover of Darkness

Wednesday Morning, 5:49 AM

Under no circumstances should one hunt a tentacled beast alone under the cover of darkness.

Wednesday Morning, 2:14 AM

It’s the middle of the night, but the coffee smells great and seems to be doing its job. I’m pushing to finalize the SSPI Main Data File tonight, but it’s looking like an uphill battle. I’m producing charts for each of the indicators, clicking through to the source data, making sure everything is in order…and yes, it’s sucking the life out of me, but my plan is to keep chugging through. It’ll feel so good when it’s finished.

My stuff¹ is strewn across a few desks in the dimmed conference room at IRLE, and the white screen is definitely a few lumins too bright for my heavy, squinting eyes. I suppose staying up late like this is a bad habit I developed in school. My face certainly wasn’t unfamiliar to the late night cleaning crews at Moffitt Library. I think I found the wee small hours of the morning generous growing up: paragraphs would seem to coalesce for me out of the darkness and problem sets would usually submit to my will before the dawn.

But the darkness sure seems to be playing it coy tonight. (Tonight? Today? This morning? None feel quite appropriate for the current time. I guess there’s no word because there’s no sane reason to be awake at this unspeakable hour.) I’m ticking through the big checklist I made a few weeks back enumerating all the things that need to be done to finalize the data. The problem is that new checkboxes have been latching onto the bottom of the list much faster than I can pluck them off the top. Where did that data come from? Didn’t we finish that last semester? Why is our documentation missing? Great, the link is dead. Etc., etc.

“Check into coal power data” is the most recent innocuous-sounding addition to the checklist, but the cheery generality of the phrasing doesn’t fool me the way it might have a few years ago. In a few short hours, a tentacled monster will hatch from the egg concealed in that line, spreading out amoeba-like, sliding and squeezing its way into unexpected places, burrowing in and holding on with razor sharp sawtoothed suckers, laying its own long delicate strings of pearlescent eggs in places in places yet to be discovered.

The beasts back in school didn’t have tentacles. Most were rather tame, actually. Some were hairy: Multivariable Calculus problem sets were positively piliferous in my freshman year. Others were slippery and slimey (Real Analysis); still others were covered in quills (Complex Analysis); and some would wriggle and squirm (Introduction to Greek Philosophy). But despite these off-putting features, these beasts were usually manageable and even made pleasurable quarry to subdue, with a successful hunt demanding only a little grit and cleverness under the cover of darkness.

Wednesday Morning, 6:28 AM

I have left IRLE and walked to Blue Bottle to enlist the support of more caffeine, which this morning—the now-rising sun pins it down definitively as “this morning”—comes in the thick and fruity form of the Mexican Single Origin Espresso. The only other customer here is an older man seated at the table next to me, talking loudly across the cafe at the barista, but saying the kind of wise-sounding platitudinous things that I’m ashamed to say still make me cringe despite my belief that they’re more or less true and exactly the kind of simple yet profound things that we in our busy and self-important lives usually need reminding of.² The barista is doing his best to humor him but is starting to show some obvious irritation, to which the yappy customer is either completely unaware or completely indifferent. All of which is totally irrelevant filler serving mainly as cover to keep me from dealing with the hideous many-tentacled research monster writhing around in my laptop right now.

Wednesday Morning, 6:30 AM

So anyway, here is the situation. I worked on the Coal Power Indicator with one of my research mentees in Spring 2022. She (apparently also a nocturnal hunter) and I met via Zoom to collect data after 11:00 PM on a March evening according to my notes. I’m looking at our data collection Google Sheet from last semester, and, frankly speaking, it’s not particularly subtle in its indication that it was put together by two overtired undergraduates in the middle of the night. The formatting is inconsistent, and the sourcing isn’t super clear, but I think I remember that the reporting organization we settled on was IEA, and a quick look at our indicator table confirms this.

Navigation to the IEA site brings me to a front page that immediately fills me with a vague sense of dread. It’s like walking into a room you’ve bombed an exam in, or maybe like accidentally opening a file for a big assignment you were stressed about and failed to finish in time. I’m poking around the site and looking for a link to an API, a data download, something…. Further unpleasant memories involving reading polydactylite data values off of graphs and mindlessly entering data are bubbling up in my head. More poking and some circuitous chains of hyperlinks bring me around to the charts I’m recalling. They present time series data on all sorts of energy indicators including Total Energy Supply by Source.

Below the chart there is a friendly looking blue button that says “Download Chart Data,” but it’s graced with a considerably less friendly looking asterisk, which helpfully clarifies in small grey print at the bottom of the page that “free data is only available for download in 5-year intervals” (italics obviously mine). Further inquiry leads me to a page from which you can purchase the full dataset for the absolutely reasonable price of $600 plus tax. Hopefully they throw in shipping and handling with that, too.

Which let me tell you is incredibly annoying and honestly a little bit offensive because the data I need is published in the chart I’m looking at in the one year intervals I want, and you can literally hover over the chart to get the exact value for each of the individual years if you’re careful with your mouse positioning. So they’re just dangling the numbers in front of me and betting that I am too lazy to transcribe them by hand. It feels like when you’re a shortish kid and your tall “friend” taunts you by holding something you want just out of your reach. If you’re going to make the numbers available at all, why not just publish them in the downloadable sheet instead of going out of your way to filter out 80% of the values in your chart and then hiding the rest of them behind a paywall? I’m convinced it’s just to fuck with me, personally.

The younger manager in me encountered this situation eight months ago, certainly experienced some similar indignation, and eventually came to the conclusion that something like this was a great opportunity for teamwork. So I got my mentee on call, and we trudged painstakingly through all the countries, in this case highlighting and copying the six or seven digit numbers (it turns out countries use an ungodly amount energy) by hand for each sector of energy generation (coal, oil, nuclear, etc.), then summing them and dividing out the total to obtain the proportion of the from coal sources. Simple in principle but horrible in practice. I’m sure it took a couple hours for two people to do it last time. But sometimes hours is what it takes. After all, I got my start on the SSPI tediously collecting lots of data, and if that’s how I came up, surely that’s the best way for my mentees to learn….

That painstaking process is how the aforementioned mess of a spreadsheet I was looking at got produced. What raised alarm bells for me is that the values in our messy old spreadsheet seem to be in BTU but everything on the IEA site is in TJ. Not necessarily the end of the world, but if IEA have updated their data, changed units, or changed their methodology, we need to make sure we’ve got the right information. We can’t be using untraceable data that no longer replicates. And so it is going to be collected again, by me, right now, alone. I’ll leave the volume and timbre of the sigh I am heaving right now to your imagination.

Wednesday Morning, 6:57 AM

Under no circumstances should one hunt a tentacled beast alone under the cover of darkness. Or, for that matter, at dawn.

Wednesday Morning, 8:54 PM

Huzzah! I almost yelp out loud in Blue Bottle. Eureka! JSON ex machina!

Two hours ago, I was feeling something like an immense hopelessness. The kind of hopelessness that’s almost cliche to depict on film. We all know that scene, when the party’s over and all the people have left, and the poor teenage host, who imagined throwing a huge rager at their parents’ house would be a personal and social panacea, is left (perhaps more than) slightly hungover, lamenting something unfulfilled in their best laid plans, squinting through the morning light at upset bowls, stained rugs, and other assorted filth. The camera moves in for the close up as they realize in slow-motion, campily exaggerated horror that there’s no way that they could possibly clean the house alone by 2pm that afternoon when they’re parents return from their trip. They realize they’re a facing tentacled beast, and there’s no slaying it quickly. There will be no cleaner ex machina to save them in their moment of need. Usually this is scene is followed by some kind of fable-ish denoument about personal responsibility and good judgement, some stock parent-child emotional moments resolving into a deeper understanding and empathy for each other, etc. etc.

So anyway, for a few minutes back there I tried to collect the data again by hand. How bad could it be? Bad. Post-rager solo cleanup bad. Time for a fable-ish denoument kind of bad.

Down in the dumps, I tried to think of what I could do, how I might work the problem. You see, one of the great charms of computer science is the alluring possibility of the ex machina solution. Finding it evokes that special and unmistakable triumphal feeling of completing a fairy tale quest, of good triumphing over evil, of the underdog coming out on top through sheer wit and cunning. Embarking on such a quest is often foolish–every developer has spent hours writing code that relieves them of a few minutes of minor inconvenience–but its not always foolish. With nowhere else to turn, I started thinking about the graphs on the website. They’re interactive. There must be some JavaScript on the backend. A charting library or maybe a homegrown solution. Either way, that JavaScript would be designed to handle arbitrary data, which means there must be some kind of datafile that the server is passing to the library to produce the charts. If I could just get my hands on that file….

And so I set out on my quest armed with only a mere shovel: Command + Option + I, the Google Chrome developer tools. I first tried digging through the Sources tab, but that got me nowhere. Sifting through the Network tab I was loosing hope when suddenly the glint of a URL caught my eye. My heart lept.

https://api.iea.org/stats/indicator/TESbySource?countries=AUS&startYear=1990!

I clicked on it and it opened up a beautiful black window containing white text in a monospaced font. I scanned through the file, hardly able to believe my weary eyes.

{"year":"1990"," short":"AUSTRALI","flowLabel":"Total energy supply", "flowOrder":7, "flow":"TES", "product":"COAL", "productLabel":"Coal", "productOrder":1, "seriesLabel":"Coal", "units":"TJ", "value":1460680", mcountry":"AUS”}.

This might do it. This could slay the tentacled beast once and for all. Now all I have to do is write a for loop, import the range of countries I want, and…and wait a minute…what if I just?—and I lop off the query parameters countries=AUS&startYear=1990 in one quick stroke. I click enter, and hold my breath.

A few seconds pass. Nothing. Then it comes crashing in, hundreds of thousands of lines of glorious JSON, the whole of the database, a veritable treasure trove! Such a pure, beaming grin might never cross my face again. With this file in hand, what took hours of tedious human labor to compile will be done in a few seconds by a dozen lines of code. What was unreplicable before can be validated at the click of a button. The tentacled beast has been subdued, and the taunting gatekeepers of the data have been thwarted. All is right in the world.

Wednesday Afternoon, 3:10 PM

I’ve just finished running our last SSPI team meeting of the year. The tale of my exploits in the night was met with merriment, although it was accompanied by the news that the data still has much validation work to be done. It was so foolhardy of me to think that one night’s work could finish everything off. It was silly of me to forget that I had a team behind me.

In August, nine new undergraduates joined the SSPI team as research apprentices. Onboarding is a challenge always and everywhere, and the SSPI is no exception. In previous semesters, we’ve told the new team members to read the working paper, and we’ve tried to “ramp them up” with some easy (viz. dull and tedious, e.g. collecting all those values from the IEA charts by hand) data-based assignments. We’ve done alright with this approach, but it’s not uncommon for the new apprentices to feel a little bit paralyzed and unsure when faced with their first few assignments. I certainly felt that way when I started. As it turns out, most wild beasts have tentacles, and it can be a bit intimidating to encounter them at first.

In previous semesters, I think Professor Brown and I have assumed that the students coming in have some foundational econometrics and data science knowledge, but the reality is that the UC Berkeley Economics department emphasizes theory, and lots (perhaps most) Economics Majors at Cal graduate without ever running a regression themselves. The economics degree at Cal is depressingly analogous to a Computer Science department that teaches only discrete math and algorithms classes and presents some examples of production-grade software projects and tries to explain how they work without ever having the students actually plan, write, or test a line of their own code. ECON 140 and 141 (UC Berkeley’s main econometrics offerings) are particularly poorly designed for students looking to do research. They focus on interpretation of regression results and proving theorems, which I suppose makes exams and assignments easy to grade, but it leaves most students completely unprepared to apply what they have learned in a practical setting. And honestly, who else would be taking undergraduate econometrics courses except students interested in doing research? In this day and age, I think most economics classes (and especially econometrics classes) should involve looking at, sorting, filtering, and analyzing data to get students reinforcing, applying, and maybe even discovering for themselves the concepts they’re learning in class.

Early this semester, a couple of my mentees were worried about working with data and expressed concern about whether they had enough experience to make a valuable contribution to the team. The conversations I had with them really changed the way I approached my role as a research mentor this semester. To bridge the apparent skill gap, I put together some tutorial videos for the basic tools, skills, and concepts we use over and over again on the SSPI, like the advanced uses of spreadsheets, regressions, data cleaning, and using REST APIs to get data. I spent time working through problems with students at Office Hours. I tried my best to make myself available for long, unstructured blocks of time. It took some patience and planning to feel confident spending as much time as I did on training and mentoring, but the results from this semester confirm that the investment has really paid off. I’m really pleased by the quality and thoughtfulness of the work that these students have done in the last few weeks. They’ve approached problems in unique ways, dug into the sources and definitions of data, and done excellent work with the tools we spent so much time learning at the beginning of the semester.

Which sounds great and all, but it begs the question: what was I doing, awake at unspeakable hours, hunting tentacled beasts alone under the cover of darkness? The answer is simple: I was being a bad manager. I wasn’t trusting the talent I had on my team. I thought it would be easier to just do it myself, to come back down from the mount with a finalized dataset inscribed on stone tablets. I was heedless of the nature of the beasts I was dealing with. Hubris. A classic sin. My JSON ex machina will not prevent me from trying at some fable-ish denoument morals about management and approach:

Invest in your team. This semester I focused way more on training students and paid attention to meeting them where they are. It can feel scary to be assigning people training work that isn’t advancing the team’s overall goal directly. But the time they spend training pays dividends.
Trust your team. Complex tasks demand collaboration. Even though it can feel like a lot of overhead to delegate tasks that are hard to explain, it is usually better to bring in as many collaborators as feasible, especially if timeliness is not the only or even the main goal. There are often tasks that would take less time to do myself, but taking an extra few days to bring some team members in, teach them how to do what needs to be done, and have us all work through it together builds robust working relationships and doubles down on the notion of investment in the team. Bringing the diversity of experiences, approaches, and perspectives together helps us solve problems better and smarter.
Know when to deploy your team. This is much more complicated and subtle than it sounds. Last year, I deployed our team to trudge through the website and collect data values by hand. This was a bad use of team attention and energy. If I had stepped back and thought about what a better way to collect the data would look like, as I did when I was trying to do it myself, I would have found a solution that is loads better by every metric. Finding this kind of solution is often a solitary or pairwise task, and from one perspective it might look like this is a lone-wolf approach that is in tension with the notion of collaboration. I don’t think this is actually the case. Rather, knowing when someone or a pair can “go off on a quest” and come back with a solution that is orders of magnitude better than what it replaces is one of the most important skills of management. Such solutions are highly valuable precisely because they free up everyone’s time to collaborate on more interesting and difficult problems: they are boons to collaboration. It can be tempting to dump tedious tasks and trust that things will get worked out one level down, but it is precisely this kind of dumping that makes institutions and systems rigid, arcane, and unresponsive. The more willing we are to look closely at what we are delegating and to rethink how we are operating, the better we will be at solving problems and moving our team forward. Moreover, it’s not just an efficiency problem: teams love solving problems and making progress, and hate having boring or dull tasks dumped on them. They run better when people are free from drudgery.

I’ve got some big plans in store for the SSPI team for 2023. I’m learning so much from working with this great group of people, and I’m so grateful for the opportunity I have to continue working with them going forward!

viz. Two jackets, a pen, a half-filled notebook, my laptop, a tangled mess of chargers, my Garmin (dead and hence removed from my wrist), my wallet, a crumpled up receipt from…Chipotle, the aforementioned cup of coffee, and the Aeropress whence it came. ↩
This whole aside is more or less stolen, both in content and style, from David Foster Wallace’s “This is Water”. ↩