My coworkers Rajit and Dave and I have been spending an inordinate amount of time over the last few days looking for a really annoying error in one of our apps. The exception message reads like this:
Could not load file or assembly 'Oracle.DataAccess' or one of its dependencies. An attempt was made to load a program with an incorrect format.
Yep, that's right, we're using ORACLE as our database. I assure you, this is not by choice. We've known for a while that our version of ORACLE is so old that it can remember when it played with two sticks and a rock (and was happy, dammit). Problem was, we assumed that it had to be causing this issue, though we ultimately determined that it wasn't responsible. Even better, in the process of squashing this bug we discovered that someone had probably found the fix for this issue already and utterly failed to inform us about it.
Sun Temple With Ancient Artwork, used under license
Here's the situation: the app we're building is a rewrite of a much-loved internal app for our company, using a more modern tech stack: MVC5, Dapper, etc, all of which connect to an ORACLE database which, for reasons too convoluted to discuss here, we cannot refactor or remove. On our local machines and on our "Beta" server (which is where we have our QA department do their tests), the app works fine. It connects to the ORACLE database, runs queries, returns data, the works. But when we move the app to our staging server (which runs IIS 7 and connects to production databases to allow us to do sanity-checks before deployment) the app completely loses its mind and throws the above error.
Rajit, Dave, and I figured that the issue had to be either something to do with the app pool settings or some stupid coding mistake having to do with the ancient ORACLE DLLs. So we and the server admins started poring over the app's code and the IIS setup, scanning for some simple clue as to how to drive out the insanity that ancient ORACLE and modern IIS had inflicted upon us.
And we found nothing. At least, nothing obvious.
So, in desperation, we resorted to turning things off to see if that fixed the problem. That lead to us ultimately finding, after a day and half of searching, the very setting that had been causing all of the strife we'd been experiencing this project. This bug was so subtle, so innocuous, that none of us thought to look there. In fact, all of us presumed that it couldn't even be the issue. We prematurely concluded that the newest, shiniest thing (IIS 7) was less likely to be the cause of the problem than primitive, creaky ORACLE.
See where this is going?
After much frustration, we eventually located the cause of the bug; it was a setting in the IIS app pool called "Enable 32-bit Applications". In other words, the source of all of our grief for 36 hours was this thing:
Here, let me make that a little more obvious.
Perhaps I should explain.
ORACLE cares about the bitness of the machine it is running on, meaning whether said machine runs a 32-bit or 64-bit operating system. The 32-bit DLLs for ORACLE do not work on a 64-bit OS, and vice-versa. We figured that we were in the clear, though, since the production server's OS that was running our app is 64-bit and the app itself is using the 64-bit ORACLE DLLs.
(As a side note, ORACLE caring about bitness was the reason I wrote Using 64-bit IIS Express in Visual Studio, since everyone on my team needed to flip that switch in order to be able to debug the app locally.)
Did you catch what the description of that "Enable 32-bit Applications" setting actually said, though?
If set to true for an application pool on a 64-bit operating system, the worker process(es) serving the application pool will be in WOW64 (Windows on Windows64) mode. Processes in WOW64 mode are 32-bit processes that load only 32-bit applications.
Well, crap. Maybe if we'd slowed down a bit and RTFM we'd have saved ourselves some headache.
This was, obviously, reproducible in both the "Beta" and production environments. The reason that our app didn't immediately fail on our "Beta" server was that, on said server, the application was in its own app pool with "Enable 32-bit Applications" disabled. Apparently some idiot already ran into this issue months ago and forgot to tell us about it, seeing as we couldn't come up with another reason why the app pool was set up this way (and, y'know, the server logs told us such). Regardless, once we set up the application on production in its own app pool with the setting disabled, it worked like a charm.
So, yeah, I hate this stupid-ancient version of ORACLE we're forced to us (even though it wasn't the cause of the problem). It's old, it's slow, it requires us to care about whether the server it's running on has a 32-bit or 64-bit architecture. It's tacky and I hate it.
But mostly, I hate that there is someone out there, in our own company, who took the time to read the friggin manual, figure this nightmare out, and then proceed to not leave a note to inform us devs that this dumb little IIS setting would be the root cause of this infuriating bug. If you've solved a problem that others might encounter, and the solution was non-obvious, can't you leave a comment, a sign, a carrier pigeon, something to tell the people that come after you about it? Is it really that difficult to do?
That guy? Yeah, I hate that guy the most.
Of course, that guy is me. So say the server logs. Take that, me!