Code generators need not be perfect

Le mieux est le mortel
ennemi du bien. — Voltaire

I’ve been working on expanding the v2tofhir code that Audacious Inquiry developed for one of our projects into a broader offering for public health that handles more than just Immunization messages.  That’s not going back into the original open-source work, as it is not within the scope of that program. But it is going to become an application available to our customers. Contact me if you want to learn more.

One thing I relearned from my vibe coding session with Copilot is that code generators don’t have to be perfect — although it certainly helps if they are.  I already know from years of experience working with natural language processing is that heuristics can often get you most of the way there, and specialized exception handling can handle what the heuristic doesn’t.

By putting these two ideas together I can successfully generate code for the 2700+ mapping rules in the current HL7 V2 to FHIR specification in just minutes.   This makes my code generator immediately effective to support the 75 or so segment mapping tables supported by that specification. With a few minutes, I can generate code to support every segment, and within a few hours more, can correct all of the missing bits.  

The compiler is a perfect tool for detecting the exception cases. Like a canary in a coal mine, it is especially sensitive to danger. The code generator I’m still developing has been an excellent QA resource for the HL7 V2 to FHIR spec, as it enables me to verify content in the spec.  Some errors in the generated output are actually caused by typos in the over 10000 lines of CSV data that is used to generate the HL7 standard.

This is a tremendous lift for development, and as we figure out the patterns of errors, we can easily augment the code generator to fix them.  While that’s being done, we still have code that we can tweak to perfection in much less time than it takes to generate it manually.  It also reduces the amount of SME expertise needed to create the application.  

The original open-source code took a more manual approach to automation (I was actually trying to figure out the process in using spreadsheets).  It took about 3 months of work to handle about 20 segments.  Easily half of that was putting together the surrounding infrastructure rather than parsing individual segments.  The code generator took about a week to build.  It gets very close to final code for 75 segments.  These leave me about 40-50 errors to resolve, most of which are simple.  I’m expecting another week or two will be needed to handle those issues in the generated code.  This is easily an order of magnitude improvement. It can be repeated on new editions of the V2 to FHIR outputs much faster than fixing the code by hand, which also speeds up maintenance.

administrator

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *