Long-running data feeds are a real thing after all
Over the weekend I wrote a SQL Server stored procedure and some user-defined functions it relies on that cleaned up data and inserted it into a new table. It was the sort of thing that took a short time to write, but an awfully long time to execute and to troubleshoot. It gave me an opportunity to learn more about using SQL Server’s “Include Actual Execution Plan” and “Include Live Query Statistics” features, which helped point me in the direction of creating an index that supposedly sped the whole process up by 30%.
The first version of my load script ran for 9.5 hours, and then bombed because it tried to insert a NULL value into the primary key of a table. Zero rows were inserted. I revised it to create a second version, which eventually got killed by the server (it timed out or over-burdened the database server, I guess) before it completed after over ten hours. Zero rows were inserted. The final version ran yesterday and overnight last night for over 16 hours, which makes it the longest-running data feed I have ever written by over 10 hours. This run was successful. Only 37,000 rows were inserted, but to get the data I had to comb over 45 million records, which is why it took so long.
As an auditor, I have not always believed it when IT would tell me a data feed took all weekend to run. This experience helped me understand than it can happen, especially when it is an ad-hoc job.