ETL with Python

I have been experimenting with writing ETL (extract, transform, load) scripts in Python for work. The scripts were mostly fun and easy to write, but the outcomes have been disappointing: unacceptably slow load times.

I usually work with an instance of SQL Server that runs locally on my computer. I use it for data collection, analysis, and reporting, and have never needed a shared server. In the local environment, I have always loaded .csv files to staging tables via BCP or the BULK INSERT command in T-SQL scripts; then I use T-SQL queries or stored procedures to load the data into the destination tables. This approach is very fast, but it only does “L” part of ETL (load). It also doesn’t work for remote databases, unless I have access to a filesystem that database server can read.

A Python script using a library such as PETL or Pandas, on the other hand, can do the whole thing: read a file in from just about any file format, transform the rows and columns in various ways, then load it into the database.

Lately I have been working with various databases that are stored somewhere in the cloud. They are the slowest databases I have ever worked with, and I don’t have access to the filesystem they can BULK INSERT from. Writing an ETL script lets me work around that problem easily. By far the hardest part of writing my first ETL scripts with Python was getting working SQL connection string. (I think that part would have been a cinch if our database was something open source like PostGRES.)

Unfortunately, my simple ETL script could load a table at the rate of 5 records per second, which is comically slow. At that rate, our data will take days or weeks to load. I’m not sure what the bottleneck is. It is probably slow network transfer time on top of a hugely sub-optimal data insertion method on the database side. I think that, in the end, it may not be worth figuring out. I should probably try a better tool for the job.

Enable Recycle Bin on mapped network drives

My favorite tech discovery of the week was this one: an old TechNet post titled “Enable Recycle Bin on mapped network drives.” This article helped me figure out how to get the Windows Recycle Bin working on my primary documents folder, which is a mapped drive that isn’t really a mapped drive.

For historical reasons, I store my documents on a mapped drive that is actually a deeply nested folder on my hard drive. Several years ago, my company implemented Microsoft OneDrive for file sync, sharing, backup, and so on. While that is great, it did mean that I had to move my folder tree to a very long path. Because I like to use long, descriptive names for everything, I ended up bumping into “file path too long” errors, especially in Excel. Excel will not even open a file whose path is over 256 characters long.

When your Microsoft OneDrive root folder already takes up almost 100 characters, that leaves too little left for me to use. To get around the character limit, I used the [SUBST] command in a startup script to substitute a drive letter (Z: in my case) for a folder on the C: drive.1 The downside of using SUBST is that you get no support for the Recycle Bin2. To get around that limitation, I found that using FAR Manager, rather than File Explorer, as my file manager would actually move files to the Recycle Bin upon deletion. Unfortunately, FAR Manager’s text-only interface and general jankiness led me to abandon it for Double Commander, which is superior for my needs but does not move the file to the Recycle Bin on my mapped Z: drive.

The TechNet post I found describes a way to make a .reg file that will update the Windows Registry and create a new mapped drive to whatever location you want, with full Recycle Bin support. You create the file, run it once, and you are all set. The drive, unlike one mapped with SUBST, will persist across reboots, and it will have full Recycle Bin support.

I love it, and think I should try to write a PowerShell or Python script that will create such mappings without having to edit a .reg file manually.


  1. Believe it or not, but I still remembered the SUBST command from my MS-DOS days in the late 1980s and early 1990s. ↩︎

  2. I take it for granted that it is preferable to have the safety net of the Recycle Bin, in case I delete something accidentally. ↩︎

Robocopy

Much to my surprise, I have been relying heavily on a very old Windows app, Robocopy, at work the past few days. Part of the work I need to do this week involves moving very large files to and from a network share over a very slow VPN connection. Those file transfers were hosing my file manager, DoubleCommander, so I started doing them from Windows Terminal via Robocopy. It’s a tool I haven’t really used in about 20 years, when I wrote batch jobs to deploy my software or move data files around, but it still works.

It defaults to copying folders, but it is easy to configure the command to move individual files or groups of files. I love how it provides a summary of what it is going to do, and provides a completion percentage as it copies or moves each file.

Tonight, I learned of a feature that I should have been using the whole time: the /compress flag. It will request network compression during the file transfer, if it is available. I will try that tomorrow to see if it speeds up those enormous file transfers I have been doing.

I’m setting up a web server for my five-year-old son to play with on an old Raspberry Pi that fell into disuse a long time ago. I was delighted to discover the Raspberry Pi imager exists now to make OS installs a breeze.

When I used Mac OS 8

Seeing Mac OS 8 emulated in a web browser today brings back some pleasant memories of my years as a Mac tech in college. Most of the machines I worked on for that job ran OS 8 or OS 8.5, which to my Windows-centric mind were beautiful and fun to work on.1

For two years, I worked in the theatre building with one of my best friends. We were in charge of keeping the theatre professors’ computers—and almost all of the computers in student areas, working. That entailed defragging hard drives, installing productivity software, setting up backups, installing RAM, replacing laptop keyboards, performing OS upgrades, and so on.

When I first got the job, I had never even used a Mac before. The only Apple computers I had ever touched at that point were the Apple II and Apple IIGS in my middle school and high school, which I had used almost entirely for word processing. I would not have gotten the job if my friend had not vouched for me during the interview process. It turned out that my experience messing around with Windows software (warez mostly, at the time) and reinstalling Windows every six months after inevitably bogged down, made me somewhat overqualified. If anything, fixing problems on a Mac was a lot easier than fixing similar problems on my PC.

My friend and I—and eventually a third person who I only met a few times—whipped the theatre building’s computers into shape within about a year. In my last year of college, the job became a make-work job for me. I set my own hours and did largely whatever I wanted to. People approved of my work, but a lot of it probably didn’t need to be done.

That year, I spent many hours working in FileMaker Pro to build a sophisticated inventory system for the theatre’s hardware and software. I was a bit obsessive about it. Some nights I couldn’t sleep because I was thinking too much about solving programming- or database design problems. Well after midnight, I would get out of bed, walk across campus to the theater building, find my way inside2, and program in my little office for hours.

Meeting those theatre students—most of whom I would never have crossed paths with if not for this job—was one of the pleasures of working there. I remember that grad students were especially friendly and chill; they seemed like students from a completely different school than the competitive, stressful one I attended. The best part of the job, though, by far, was that I got comped two tickets to every show at the theater—whether they were student productions or professional ones. I went to every single show, which otherwise would have been far outside of my budget, and loved almost all of them. The experience kicked off a life-long love of theater…and of Macs, too, of course.


  1. A few of the older Mac in the building ran the last version of System 7 and had monochrome monitors. Even then they seemed like relics. Still, I learned a lot about Hypercard on them, so I have a soft spot for them, too. ↩︎

  2. After hours, the theatre building was always locked. If you worked there, though, you knew how to get in. Even after midnight, there were often students doing theatre work inside, and none of them were supposed to be there, either. ↩︎

Prepping my old MacBook Pro for my son

My five-year-old son is really into coding. He doesn’t know how to do anything yet, but he wants to learn. Currently he plays around with Swift Playgrounds a lot on his iPad (which is a little too old and slow to run it properly). He also orders his grandfather to program a collection of JavaScript games, utilities, and other doodads for him on a personal website. I’m going to start him on Scratch and Swift Playgrounds on the Mac. Because he is very interested in websites, I promised to introduce him to HTML and “Hello World!”-level JavaScript.

To get the ball rolling on this endeavor, I decided today that I will let him borrow (or use most of the time) my 15” 2013 retina MacBook Pro. It’s a little too big for him, screen-wise, but it’s the only machine I have. Tonight, I spent a half hour prepping it for him. Tomorrow, I plan to get NextDNS set up for web content filtering, and also to explore what other safeguards I can put on the machine to keep him out of trouble. He has a history, believe it or not, of setting up accounts on internet services and contacting the companies via web forms or email. I locked down his iPad, but he started to find ways around it today. I know I will have to keep a close watch of him.

I know he’s going to love using a real computer, not only for programming, but for Pages, Garage Band, and Photos as well. I hope that things go well.

Thinking about the new Apple products

Since watching the Apple announcement today and learning about the Mac Studio, I have never been happier that I did not wait for a more powerful Mac mini before I bought mine. The Mac Studio is more computer than I need and is out of my price range (for a desktop at least). I am thrilled that it exists for other people to use, and will happily listen to the pundits talk about for the rest of the month. For myself, though, the M1 Mac mini is more than enough, and the one I bought cost about $800 less than the least expensive Mac Studio.

The M1-based iPad Air is more relevant to me. I would love to hand off my current generation 4 iPad Air to my daughter and buy the new one for myself. I don’t need the performance for much, but for photo editing it will be a leap forward. I don’t think my daughter would ever go for this plan, though; my iPad Air is blue, not pink. I should have bought a neutral color.

Knock-off Laser Toner

Tonight I performed some surgery on my color laser printer’s empty toner cartridges and installed knock-off ones in their place. I feel a little dirty, but I saved about $400.

I resent that toner cartridges now have microchips in them that are required for the printer to print. The chips help the printer report its toner levels, but otherwise are there to make a rather generic toner cartridge into into something proprietary and overpriced. The knock-off toner cartridges I bought came with tools and instructions for transferring the chips from the original printer cartridges to them. It was pretty easy to do, and the printer prints in color again with the new cartridges installed.

The printer will always report low toner now, no matter what the toner level actually is. I expect to field questions about it from my family for the rest of my life.

A miracle

Digital media is a miracle. It is infinitely reproducible with no loss of quality. The internet and all the devices we have make sharing digital media less expensive, on a marginal basis, than was possible via any other technology that came before it. We have invented a way to make some resources—like art and entertainment—effectively unlimited and nearly free.

We take it for granted now, and moneyed interests are busy trying to dismantle it with blockchains as I write this, but I think we should all step back sometime and consider how incredible and wonderful it is. The miracle of digital media isn’t that it can be made finite; it is that it is infinite. We should embrace that miracle rather than try to replace it with something mundane.

web3

I advised a colleague today to research web3, because I think it may be the most interesting InsurTech technology of the year. I don’t think that glomming cryptocurrency and smart contracts onto the web is a good use of technology at all. The idea is especially dubious, and that is the reason it is interesting to me. I honestly think that web3, along with the cryptocurrencies that make it possible, are based on a long con.

Web3 appears to be something to make cryptocurrencies, which are useless—except as a speculative asset or a way to pay the criminals who ransomware-attacked you—useful. Web3 will require you to have a digital wallet to pay for and log into any website with a web3-type paywall. The rest of web3 is just another name for smart contracts, which—as far as I can tell—are an interesting idea that has not caught on very well in real-world applications.

Perhaps the scope of both my research and my imagination is too narrow, but I don’t think web3 is going anywhere outside the venture capital community, at least not for a very long time.

Double Commander

I am on a mission to replace Far Manager, which is a Windows file manager that I really love, and have used for over a year. Far Manager is a text mode file manager that has been in development since the 1990s. It is a lot like Norton Commander, which I used briefly in my DOS days. I like how fast the UI is, how easy it is to navigate the filesystem, and also how easy it is to read the file and folder names in text mode.

Unfortunately, it has a few drawbacks that have been driving me crazy. First, opening Visual Studio Code from it, which I have to do all the time, will often mess up the UI and require a restart. Second, the keyboard shortcuts—many of which I have memorized—are bonkers. The left and right shift keys act as completely different modifiers, and the left and right control keys act the same way. This is not a problem for my standard ANSI keyboard, but I am trying to move to an ortholinear keyboard which doesn’t have two shift keys or two control keys, so some of the functionality I rely on is inaccessible.

Today I found another orthodox (two-panel) file manager that runs on Windows, has a full graphical UI, and is very, very customizable. It’s called Double Commander. Life Far Manager, it is free, and it has the two-pane interface I love. Unlike Far Manager, you can customize nearly every part of the user interface, including all the keyboard shortcuts. I was able to pare down the default toolbars to a minimum, color the interface to have white text on a navy blue background, and learn the few keyboard shortcuts I need to know without any trouble. Prior to learning about it today, I thought I had tried all the orthodox file managers for Windows. Double Commander is my favorite of the bunch.

I should never have bought my daughter an iPad with only 32 GB of storage on it.

I never imagined she would use it as a camera and fill it with photos and videos. iCloud Photo Library has been enabled from the start, and we have plenty of cloud storage space. Unfortunately, Photos will not free up space no matter what I do. If you delete a photo or video from Photos, it is deleted everywhere, so that is not an option.

Tonight, as I update iOS via tethering the iPad to my Mac, I am weighing my next move. Do I wipe the iPad and set it up as new, which would at least forestall the problem? Or do I simply turn off iCloud sync, wait a while for the photos on the device to be deleted, and then turn it back on? I can’t remember if I have tried the latter before. I think I may wipe the device because my daughter’s notes app has 39 notes and is taking up over a gigabyte of space, too. Strange things are afoot.

One last thing about the Apple TV today

It puzzles me why the optimal settings for HDR and framerate are not selected by default. I always have to go to the audio/video settings and set the frame rate to “match content”. If I don’t, I get audio drift (lip sync problems). I also learned tonight that I have to set the default video output to non-HDR because the Apple TV menus look terrible in HDR, but then set the HDR setting to “match content.” That’s the sweet spot for me.

The top of the new Apple TV 4K remote looks just like the iPod click wheel. Unfortunately, it does not work anything like one; you can’t slide your thumb around the perimeter to scrub. I don’t get it. It seems like a big missed opportunity to me.

Why do I even bother running a home file server?

Over the past year, my TrueNAS Core server has been bugging me every few months about one of my boot failing. In this case, the boot drives are simply two USB sticks, run in a mirrored configuration. If one fails, the other one handles the load. (The reason I use a USB stick for a boot drive is that my server has an internal USB port for that very purpose, and no other place for the boot drive to go.)

This week, after yet another USB stick failed, I tried to resolve the problem by buying new USB sticks and installing TrueNAS onto them. The OS installs took hours and the server would not reboot. Eventually I discovered that I could buy an SSD drive with the form factor of a USB stick, which is what I have wanted for years. It fits inside my server like a USB stick, but contains a fast, durable SSD drive instead of slow, fragile flash memory. I installed TrueNAS onto it, which seemed to take seconds rather than hours, but I could not boot from it. I spent an hour swapping USB drives and trying to boot the server until it would not boot TrueNAS from any USB drive I had. It was a disaster.

I gave up and installed Ubuntu Server onto the new SSD drive. I knew that Ubuntu supports ZFS now and could import my existing data pool. Luckily, it boots like a champ. After the install, I found some instructions to help me set up ZFS, Samba, and Minio, and—after editing file permissions—everything is set. I didn’t lose any data during the OS switch, but I did mess up my Arm backups when I tried to move them into a new Minio storage location. Luckily, I wasn’t depending on those backups for anything, because my important data is in the cloud and I have backups on Backblaze B2 as well.

I use the NAS daily for media sharing and backups. It takes very little maintenance except once in a while when it becomes a headache and money sink. I sometime wonder if I should go back to having an external hard drive instead, now that 8 TB external hard drives are pretty cheap. The problem with that approach is that external hard drives get hot and die, which happened to me so many times that I bought a NAS.

I will miss running TrueNAS (which used to be called FreeNAS) because I have run it for ten years and used to be really into its FreeBSD underpinnings. Now, all the servers I rely on run either Ubuntu Server or Debian, and I will just have to deal with administering them via SSH rather than via a web page.

Maybe It’s Just a Product Nobody Wants

I love how Matt Birchler compares crypto to BitTorrent in his latest blog post, “Maybe It’s Just a Product Nobody Wants”:

I don’t think crypto is going to disappear, by the way. I think it will always have a place in the world, but much like bittorrent before it, it was new & exciting, people tried to use it for basically everything, and then it settled into being used for, well, nothing for most people. Blockchains likely have a more prominent future, but there’s a lot of spaghetti being thrown at walls right now, and I think very little of it will stick because it’s not actually making better products.

I’m a crypto skeptic who thinks a lot about blockchain for work-related reasons. I dislike blockchain technologies because I think that, in the real world, they would fail to eliminate trusted intermediaries in financial transactions. Establishing trust without intermediaries is the whole point of blockchain.

I believe people and businesses are too risk-averse to do away with intermediaries like governments (who offer useful things like a legal system and deposit insurance) and the technology providers, agents, and brokers who make business work today. If crypto really takes off, and the old intermediaries are pushed out, I think that new intermediaries will pop up to fill in the gaps left behind. That future—blockchain with trusted intermediaries—is no better than what we have now, and is in many ways worse.

I wonder what specifically happened this week to make “web3” a topic on every tech news site and every tech podcast I listen to.

The new MacBook Pros…wow!

Today’s new MacBook Pro models are the first in a very long time that seem too “pro” for me. Their processor specs make my M1 Mac mini—which is fantastic—seem pretty pathetic by comparison. I’m not in the market for a new machine, but I look forward to reading the reviews.

After a long time it feels great to be a Mac user again, because there are really good products up and down the line.

ISO 2145

I never knew there was an ISO standard for numbering document sections. I kind of love it and kind of hate it.

I am the sort of person who developed a deep preference for the ISO 8601 date format, so it may be inevitable that I end up adopting it.

Incremental improvement

I think all the products Apple announced this week are great. I am tired of hearing that some people are disappointed that they are only incremental improvements on the previous models, or that Apple isn’t exciting anymore. That’s true of every product that isn’t a new invention. And how exciting could the umpteenth smartphone be?

The steady ratchet of incremental improvement is one of humankind’s greatest achievements. The entire modern world is built on it. It’s how you get from good to great. Complaining about it is nonsensical.

After thinking about it quite a bit (too much really) I ordered a 4-pack of Apple AirTags. I think it will be useful for my wife and I to have them on our keychains and in our go-bags, especially because we are spending more time away from home now. I look forward to kicking the Tile app off my phone.

Back to Feedly (for now)

I have given up on Reeder 5’s native RSS feed handling with sync via iCloud. I had expected my devices to keep in sync, and for Reeder to sync in the background rather than whenever I started it. I think I asked too much from it. I use three iOS devices each day, and Reeder was showing me stale content on all of them at various points each day. I suppose local device syncing of 75 feeds was a bit too much for it to handle quickly and smoothly. I moved my RSS subscriptions back to Feedly and am once again satisfied with multi-device support. I think I have learned that a web service works better for me, which probably should come as no surprise.

Apple to Pull ‘iDOS 2’ DOS Emulator From App Store

I am neither surprised or disappointed by Apple’s impending pull of “iDOS 2” from the App Store. The whole point of buying Apple hardware has always been to buy into a unique ecosystem: the “walled garden.” While the Mac has never been locked down to Apple’s Mac App Store, the iPhone has always been locked down to its App Store. It’s easy to forget now, but the iPhone grew out of Apple’s prior consumer electronics smash hit, the iPod, which was was completely locked down. It didn’t have an App Store. Neither did the iPhone at first, either.

I hate to side with one of the world’s biggest companies here, but I totally believe that the iPhone is a console, as Steve Jobs described it. I knew that going into the iPhone ecosystem, and that’s actually what I wanted, and still want, from that ecosystem. I want an apps console that (for the most part) just works, and doesn’t require a lot of my time and effort to work smoothly and securely. I came at this from the other wide: an Android users who jailbroke and hacked his phone into something completely different than what its manufacturer and mobile data provider wanted or intended. The thing is, all that customization led to a system that was unstable, and I had no idea if it was secure at all, because the code (apps and OS) came from a bunch of different places. I just had to trust, blindly, that everything was OK. The iPhone imposed guardrails on my hacking activities—guardrails that I wanted, because what I was doing wasn’t working for me anymore.

I think a lot of people chafe at the idea that their most useful device is a console because we reserve that word for entertainment devices like the Nintendo Switch and the PlayStation, or the humble cable set-top box. It doesn’t matter, though, if the iPhone is more useful and more important than a video game system, though. What matters is how it is sold.

It isn’t exactly a secret that normal customers can only download iOS software from Apple’s App Store. Beyond hardware, access to that App Store is the fundamental thing being sold by Apple. Customers should know it when they are choosing a product. I doubt any of the complainers and hand-wringers commenting on this article on MacRumors didn’t know that going into their iPhone purchase.

I would love to run Windows games on my Nintendo Switch, but I can’t because it is a console and Nintendo does not allow it. That doesn’t surprise me, or anybody else for that matter. The situation is not really different than the one with the iPhone. If you want to run DOS on your mobile phone, the far-more-open Android universe is there for you—and it’s the most dominant OS platform in the world, too. Vote for it with your wallets and your time.

Playdate

Panic’s bright yellow pocket game system, the Playdate, looks cheerful and cute. I want one, plus the blocky stereo dock, just to put on my desk like other people place toys and figurines. Ars Technica published a review of the hardware and a preview of some of the games yesterday, which whet my interest. I don’t know if it is worth the money for me to have a geeky object d’art for my workspace, though.

The Steam Deck

The Steam Deck looks fantastic. It’s like a Nintendo Switch for Steam games. Considering I wrestled with a Windows laptop for 8 hours to get it ready to play Steam games just days ago— and had the whole thing fail, the very next day, to understand how to work with the very Xbox 360 controller I was using the day prior—I never want to bother with PC gaming again. It’s too fiddly and too expensive. But this device might be easier to set up and more performant than the $2,000+ (but not specked for gaming) Windows laptop I already have.