PowerShell for Data Transformation: StringData
To borrow from a favorite web comic: whenever I learn a new skill, I concoct elaborate fantasy scenarios where it lets me save the day. Actually, it feels like I don’t need to concoct elaborate scenarios, the scenarios seem to find me after I’ve coincidentally learned an applicable skill.
PowerShell to the rescue!
This week, we lost track of the “good copy” of the script that parses the FHIR Operation Outcomes generated by one of our engines. The “integration record” component wasn’t quite in the right format. It’s definitely not the expected CSV, but it’s also not valid JSON. What to do?
In a previous blog, I discussed how we can parse JSON files, specifically the FHIR Operation Outcomes. The pipeline in question is the basis for a function that I’ve put in my $PROFILE.
function Get-OOIntegrationRecord {
param (
[Parameter(Mandatory, ValueFromRemainingArguments)]
[string]$outcome_file
)
Get-Content $outcome_file | ConvertFrom-Json | Select-Object -ExpandProperty extension `
| Where-Object url -eq 'http://upmc.com/fhir/integration/logging/StructureDefinition/integration-record' `
| Select-Object -ExpandProperty valueAttachment | Select-Object -ExpandProperty data | ConvertFrom-Base64
}
This function has already proven itself handy for pulling the original record out of an Operation Outcome, but I didn’t realize that I’d actually grab this tool for a real production issue. We have a script that pulls this same payload out of an operation outcome, parses it as a CSV row, adds it to a data structure, and then after it’s parsed a directory of files, it outputs the CSV to a file. One of our pipelines doesn’t output the right format. We have a separate copy of the script somewhere that can handle this format, but we’ve lost track of it. It might be where it belongs, but it wasn’t the currently-checked-out copy on my laptop and nobody else seemed to know if theirs was the good one except for the two who had unsuccessfully tried their copies.
I didn’t immediately know what was wrong with the formatting. I think I’d had the good copy of the script and kept it separate until recently when I reorganized my git directory. In this particular engine, when we create our error outcome, we write a Java PatientDocument
object using .toString()
. The resulting object looks almost like JSON, but not quite.
Valid JSON:
{ "key": "value", "key": "value" }
This:
{key=value, key=value}
Fortunately, the key-value pairs can be easily parsed with ConvertFrom-StringData
provided we get rid of the curly braces and put all of the key-value pairs on their own lines. The ease of doing this inline really blew me away…
$(Get-OOIntegrationRecord <json_file>) -replace '[{}]', '' -replace ', ', "`n" | ConvertFrom-StringData
All I had to do was wrap Get-OOIntegrationRecord
with $()
and then add two inline -replace
, which can accept a RegEx… so a simple character class with the curly braces replaced with an empty string gets rid of those, and replacing the comma and space with a newline, noting that PowerShell uses the backtick ` to escape a newline to avoid confusing it with the backslash \ that Windows uses as the path separator.
Once we pipe that through ConvertFrom-StringData
, the key-value pairs are turned into a Hashtable, which can easily convert to CSV with one last little transformation: another -replace
to get rid of the null values.
$($(Get-OOIntegrationRecord <json_file>) -replace '[{}]', '' -replace ', ', "`n" | ConvertFrom-StringData | ConvertTo-Csv -Delimiter '|' -UseQuotes Never) -replace '\|null', '|'
Where this method really shines is when you pipe a bunch of these files through at once:
(Get-ChildItem *.json | ForEach-Object { $(Get-OOIntegrationRecord $_) -replace '[{}]', '' -replace ', ', "`n" | ConvertFrom-StringData } | ConvertTo-Csv -Delimiter '|' -UseQuotes Never) -replace '\|null', '|'
All of the objects roll up into a single CSV file. Not bad for data that wasn’t formatted correctly!