PS Module for DocumentDB

PS Module for DocumentDB

First off I’d like to mention briefly about the potential I see in using DocumentDB in Azure.

I’ve never gotten into the database scene, other than from the infrastructure design side. However, after seeing how easy to use this is, and how I can utilise it to store data in a format any Azure person should be familiar with (JSON), I just couldn’t help but have a play.

The problem I found is that there were no real “solutions” for interfacing with DocumentDB via my favourite language; PowerShell, other than a few suggested scripts people had written but nothing fully covered creation, update and querying of DocumentDB in one.

So I set off on a journey to write a Module that everyone could use to take the pain out of it, it is also capable of using Azure Automation, because why would you want a module without that capability now days?

With DocumentDB your data will sit in JSON format in what is referred to as a “Document” this can have several layers depending on the JSON, the fields are able to be queried. Every Document must have an ID field, this can be something relevant for identifying or a GUIDas long as it is unique. It is important to think about your data structure carefully before you start creating.

Firstly you will need the module, you can easily get this from PSGallery or via this powershell command

install-module -Name DocumentDB -Repository PSGallery

ddbdemo1


Databases and Collections

Firstly an example of the Functions that will list the databases and the Collections

Get-DocDBDatabases and  Get-DocDBCollections will do as suggested and retrieve the Databases in the DocumentDB account or Collections in the specified DB, this not only to list the DBs but also a quick way to test your access key works, before you try uploading or querying documents.

Get-DocDBDatabases -accountName $AccountName -key $key

ddbdemo2


ddbdemo3


DocumentDB Account keys are mainly for admin use and cover all resources within that account, so keep them safe.

Create Documents Demo 1

For creating / updating a document use Set-DocDBDocument, I have included the option of passing a PSObject Using the -PSDocument option or if you have your JSON pre-formatted use -JSONdocument

In the following example I am using is a CSV of user data.


ddbdemo4


This data doesn’t contain any id field, which is needed for a valid Document to be created. So using recursion I will add a GUID to each record with the Add-Member CMDLet and then upload.


$csv = import-csv 'c:\temp\Demo.csv'

foreach($row in $csv)
 {
 $row| Add-Member -MemberType NoteProperty -Name id -Value $(new-guid).guid

 Set-DocDBDocument -DBName $Database -accountName $AccountName -key $key -collection $Collection -PSdocument $row|out-null
 }


ddbdemo5


I used the -PSDocument parameter, this is because it was a Powershell object and not JSON, this funtion does the conversion for you when you use that parameter.

Now that some data has been loaded  we can run a query, in this example I’m looking for a name, and not using the ID, which I why a GUID was good enough for the id field. You can however choose to have the id field contain the names, as long as you can guarantee they will be unique, else you risk over-writing documents.


$Query = @"
{
"query": "select * from $collection coll
where coll.FirstName = 'Robert' and coll.Surname = 'Burks'"
}
"@

New-DocDBQuery -DBName $Database -accountName $AccountName -key $key -collection $Collection -JSONQuery $Query


ddbdemo6

As you can see here, it has pulled back the record for Robert Burks, it is also not in JSON format, it is a PSObject again, this makes it easy to manipulate using native commands. To convert this to json we can simply pipe it into the ConvertTo-Json CMDLet.


ddbdemo7


Create Documents Demo 2

So now I’ll try uploading a pre-formatted JSON document.

ddbdemo8

In this instance, what I have done is put an id field and data field as the parents and all other records are children of the data field.

ddbdemo10

I upload it using the -JsonDocument parameter, I also didn’t pipe this into |out-null like last time so it returns data to the console.

When running a query the New-DocDBQuery function will clean all of those additional fields out and return the mutable data, however you can use the -NoClean switch to get all the data returned as seen above. The clean data is returned as per the query shown below.

ddbdemo11


There is one important thing to be aware of in this example; I have only created one document, this is great for not having to manage IDs, however DocumentDB does not have a native ability to update only a single field value, so any changes you make will require you to pull down the entire document mutate it and then upload it again. So in my opinion the first example is a better way to structure the data.

Have a play and see what creative ways you can find to use DocumentDB for storing data, this works well for Azure Automation runbooks that need to store data somewhere

I hope you find some good use for this module, it is currently published on PSGallery and GitHub.

Thanks for reading.

-Jason (JSON)

2 thoughts on “PS Module for DocumentDB

  1. Just a brief update, I have another version in the works (when I have spare time) to enable stored procedures and a few other tweaks.
    I ran into an issue today where a query was not returning data from a VM i had published into a collection. The data causing the issue in the collection was as follows:
    “StatusCode”: {
    “value”: 200,
    “Value”: “OK”
    }
    As you see it has duplicate Keys, Json validation doesn’t care about that, however when you pull the data back in and expect it to convert into a PSObject, Powershell screams at you in red.
    convertfrom-json : Cannot convert the JSON string because a dictionary that was converted from the string contains the duplicated keys ‘value’ and
    ‘Value’.

    For this case you either have to filter the data going in or use the -NoClean switch when you query the data and then manipulate the JSON before you convert it.

    If anyone has any ideas on this one I’d love to discuss, in the mean time I’ll see what I can do with it for my problem, most likely filter the data going up since Powershell Automation this will be it’s primary purpose.

    Like

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s