Как распарсить json в excel
[Disclaimer: the Excel-JSON import offered by the Power Query Excel add-in is only available for Windows — upvote this UserVoice feature request so the team prioritizes adding the PowerQuery Excel add-in for Mac.]
As part of a UW class on information visualization taught by Brock Craft, I found myself changing gears from big data, to much smaller data.
Data wrangling is a necessary step when working with publicly available data and enriching/preparing said data for consumption by apps, services, or even visualizing it directly with libraries such as D3.js. Online services such as the Microsoft Azure Maps service API or the Google Maps API can help to enrich location data by either geocoding, or reverse-geocoding depending on the data available, and format needed. In my case, I had a very simple dataset that had (among many) a column for the name of a US city, and another for the US state for each row.
My goal: enrich each row in the dataset with the latitude and longitude to be able to use D3.js more efficiently.
-
— super easy to create and format requests to web APIs. — my favorite lightweight editor — the ultimate application to work with smaller datasets by using pivot tables, look-ups, etc. — used to query each city and state in order to obtain the address, geographic coordinates, and other useful pieces of data, such as county, state initials, etc.
[As a prerequisite, you need an Azure subscription — you can get some free credit when signing up for Azure]
The first step was to create an Azure Maps account from inside the Azure Portal and obtain an API key. You can see how to do it in the Azure Maps documentation here.
With the Azure Maps API key, I used the Postman app to create a few sample requests and ensure the key was working nicely. You can see how to submit them in the How to Search For an Address section in the Azure Maps docs.
Once the basics worked, I made an Excel pivot table from the original dataset to list the distinct combinations of city and state and used the preview batch request API to send a query with them. This was a better approach than querying for the same city and state hundreds, or thousands of times.
I copied the response data into VS Code and saved it as .JSON to then work with the file in Excel.
Open the JSON file using the Power Query add-in:
Data →Get Data →From File →From JSON
Once you select the file, you’ll see the root nodes in the JSON file. Note that these are links that you can navigate to.
The first step in the Power Query Editor is to convert the nodes into columns.
Record Tools →Convert →Into Table
They will look like this once they are converted:
Click on the item List since it contains all the records we’re interested in. The summary shows the summary of the API request; for this process it is not relevant.
We’ll do the same transforming into table at this level:
After it is converted to a table, you’ll see a small icon that lets you expand the contents into columns. Click it to see the contents of the record object and to expand it:
After expanding, the table will show a record object and a list object in each row. This is because each row represents:
- A query record (the query we did for each row, such as Paris, Texas, US, or Seattle, Washington, US).
- A list of the associated search results for the query which can contain 1 or more results depending on the request settings. In this case, to make my life easier, in the request I specified only one.
If you notice the Postman screenshot, shows the first few rows of the result — you can see how results is in fact a list of records.
With this table, we will do the exact same thing again for each objects column→ Expand to new rows.
After expanding the query summaries, we scroll right, and then expand the results objects column:
This expansion will change the object from list to record because each list has only one record:
We expand this again:
After expanding, we’ll see data, plus some more objects:
From this table, we will expand the address objects once more to get the actual addresses, and the position object to get the latitude and longitude too:
At this point, we’ll see the actual latitude and longitude. Whew!
Once we’ve expanded the necessary objects (and if you want, hid unnecessary columns), we can save and load the table into Excel:
Once you load the data into the Excel sheet, it will be ready to use:
From this point on you can filter the data, create a pivot table, or do any other thing that you need to refine the dataset that you’re working with.
This guide covered some basic steps to import a JSON file into excel and how to expand the contents to see the actual data returned by the Azure Maps API.
I loved using the API directly — it took me in total about 20 minutes to geocode about ~4,000 distinct addresses, which I then added to my main dataset.
The steps shown here worked well for my needs, but be aware that there are other ways to do this using Python, or even using other services, such as Google Maps. Depending on your specific needs you might need to fine tune the API request, or filter specific fields returned by the API.
With this wrangling step complete, you should be able to import your data into D3.js or other application and start to work on the visualization.
I'm trying to extract JSON data into Excel sheet as table by using the following code.
However, I'm getting the below error!
Why I'm getting this error? Kindly advise
I highly recommend to activate Option Explicit : In the VBA editor go to Tools › Options › Require Variable Declaration.
2 Answers 2
Going further there are objects within data array, each of them contains some properties that can be populated in rows on the worksheet:
Here is VBA example showing how that values could be retrieved. Import JSON.bas module into the VBA project for JSON processing.
The output for data array for me is as follows:
BTW, the similar approach applied in other answers.
I pasted your code into a test module and then imported the JsonConverter as an additional module in my empty workbook. The error you're getting is likely because you need to add the "Microsoft Scripting Runtime" library to your workbook. In the VBE go to the Tools-->References. menu and then scroll down and put a check mark next to the library. After doing this, your code parsed the JSON without issue.
However it did fail in your loop.
I highly recommend that you use Option Explicit at the top of your module. The variable types you think you're using (because I see you're attempting to use Hungarian notation) are not the types of the actual data necessarily. My suggestion is to use descriptive names for the variables to avoid confusion. Additionally, you should be looping on the oJSON("data") structure (which is a Collection by the way). Here is my suggestions put into practice:
I have the same issue as in Excel VBA: Parsed JSON Object Loop but cannot find any solution. My JSON has nested objects so suggested solution like VBJSON and vba-json do not work for me. I also fixed one of them to work properly but the result was a call stack overflow because of to many recursion of the doProcess function.
The best solution appears to be the jsonDecode function seen in the original post. It is very fast and highly effective; my object structure is all there in a generic VBA Object of type JScriptTypeInfo.
The issue at this point is that I cannot determine what will be the structure of the objects, therefore, I do not know beforehand the keys that will reside in each generic objects. I need to loop through the generic VBA Object to acquire the keys/properties.
If my parsing javascript function could trigger a VBA function or sub, that would be excellent.
I remember your previous question, so it's interesting to see it back again. One question I would have is: let's say you succeed in parsing your JSON in VBA - how would you then use that "object" in VBA? You note that the JSON structure can be of any type, so how would you navigate the end result in VBA? My first thought might be to create a JScript which would parse the JSON (using eval or even one of the "better" existing libraries) and then iterate over the structure to produce a nested scripting dictionary-based object to pass back to VBA. What are you doing with your parsed JSON ?
I will create a sheet for each object and add the records on each row, creating the column if not already existing (appending in row1). Your suggested asp-xtreme-evoluton seems interesting. Was in the process of creating something very similar. I have been provided a fixed and almost working (I fixed the little "issue") of the vba-json class. We'll be using that for the moment. The working vba-json was provided by Randyr, the author of the related question.
@tim, my previous comment might not answer your question properly. I know that the structure is basically a list of tables with records. So I have an Object (key:value) representing the tables. The "key" is the table name and the value is an Array [] of the records which are Object (key:value). I don't know for a fact which table have been provided and which columns(fields) are available. For people that can't do without a strict structure, it is wild generic programing :-) no offense to anybody of course.
More easy to follow if the structures are similar but the "keys" are different. Out of interest where is the data coming from?
11 Answers 11
If you want to build on top of ScriptControl , you can add a few helper method to get at the required information. The JScriptTypeInfo object is a bit unfortunate: it contains all the relevant information (as you can see in the Watch window) but it seems impossible to get at it with VBA. However, the Javascript engine can help us:
- If the JScriptTypeInfo instance refers to a Javascript object, For Each . Next won't work. However, it does work if it refers to a Javascript array (see GetKeys function).
- The access properties whose name is only known at run-time, use the functions GetProperty and GetObjectProperty .
- The Javascript array provides the properties length , 0 , Item 0 , 1 , Item 1 etc. With the VBA dot notation ( jsonObject.property ), only the length property is accessible and only if you declare a variable called length with all lowercase letters. Otherwise the case doesn't match and it won't find it. The other properties are not valid in VBA. So better use the GetProperty function.
- The code uses early binding. So you have to add a reference to "Microsoft Script Control 1.0".
- You have to call InitScriptEngine once before using the other functions to do some basic initialization.
This answer seems what I want but I'm getting a object variable not set when trying the DecodeJsonString function. Are there any other references I need apart from Microsoft Script Control?
If there was a missing reference, you'd get a different error message. On what line does the error occur? On what is the value of the variables used in that line?
It occurs just after the line Set DecodeJsonString = ScriptEngine.Eval("(" + JsonString + ")") . The JsonString is just a plain json object. I've tried with a variety of Json objects and get the same error.
The best ever answer . I have just completed a POC on how to call an JSON Restful service , parse the recieved json based on your answer and then displayed it in Excel . This was very well recieved by our clients. Thank you very much . +1 for this ..
I got your solution to work for VBScript by stripping out the types and initializing by using the following: Set se = CreateObject("MSScriptControl.ScriptControl") . +1 Thanks!
UPDATE 3 (Sep 24 '17)
Check VBA-JSON-parser on GitHub for the latest version and examples. Import JSON.bas module into the VBA project for JSON processing.
UPDATE 2 (Oct 1 '16)
However if you do want to parse JSON on 64-bit Office with ScriptControl , then this answer may help you to get ScriptControl to work on 64-bit.
UPDATE (Oct 26 '15)
Trying to avoid that, I've created JSON parser based on RegEx's. Objects <> are represented by dictionaries, that makes possible to use dictionary's properties and methods: .Count , .Exists() , .Item() , .Items , .Keys . Arrays [] are the conventional zero-based VB arrays, so UBound() shows the number of elements. Here is the code with some usage examples:
One more opportunity of this JSON RegEx parser is that it works on 64-bit Office, where ScriptControl isn't available.
INITIAL (May 27 '15)
Here is one more method to parse JSON in VBA, based on ScriptControl ActiveX, without external libraries:
To process json data in VBA, there are very few choices. They are scant, really. And there is nothing built into VBA, like the Javascript function, JSON.parse(). So how do we process json data in VBA?
A makeshift tactic that many have used is to fire up the Script Control or the Webbrowser control or automate a silent instantiation of Internet Explorer with the aim of having Javascript executed in those environments evaluate the json string and through a convoluted process make key/value pairs of data available to VBA. This can work, but this method has drawbacks. Evaluation of a json string can be a security liability because Javascript’s eval() function has access to the hard drive. This liability can be mostly limited by utilizing Crockford’s JSON2.js, but now the process of having Javascript parse the json string is even more convoluted and in the end eval() is still used to create actual Javascript objects from the json, but only after a bunch of checks to ensure that the json will not do anything untoward.
There are indeed a few great VBA libraries, VBA-JSON for example, that can be loaded into your project that will parse json, allowing easy access to the data. But these are all huge code bases that can seem like overkill.
And of course, there is PowerQuery. But if you don’t know PQ, this can be quite intimidating.
The only other option is to write your own VBA routine to parse out the values from a json string or file. However, this gets tricky, especially for generic json that you are not looking at when you craft your VBA routine, and especially for deeply nested json.
But we can do it in VBA and it can be small, fast, and easy to use:
Of course, the above uses a few helper routines (included below), but I wanted to show how concise the logic is. There will be unusual edge cases where this breaks, but I’ve tested it on hundreds of json files, from simple to deeply nested and it has never failed yet.
It is extremely fast. It’s instantaneous on typical web API json response strings. It takes less than a second to process a megabyte-sized file.
It produces a dictionary where the dictionary keys are full textual paths to every data value in the source json string, making access a breeze.
Take the following tiny json string as an example:
We can list every path, key and data item like so:
The above uses a helper function, ListPaths(). It is included below, but the simple code here prints the following in the VBEditor Immediate Window:
And you can get at any item by using the full path:
Or you can filter items to an array:
You might be wondering how this is even possible. At first glance it does not appear that the three main functions listed at the top actually do much. But they do! They work because ParseObj() and ParseArr() are recursive… which means they can and do call themselves and each other, over and over, as they work through the entire json string.
Some real magic happens in a support function called, Tokenize(). It processes the source json string into a list of tokens and that list of tokens is what is actually parsed by the recursive functions above. After tokenizing the source json string, Tokenize() returns an array that has one token in each element. The above json looks like the following after being tokenized (every line in the following listing is an element in the array returned):
And then from that array, the recursive functions are able to compile the paths, keys and data values and add them to the dictionary.
So the Tokenize() function eliminates all nonessential white space, removes quotes from quoted items (but leaves all escaped characters in place) and isolates every token. This can easily be done in VBA code, but it is more than 100 times quicker to have a RegEx pattern match what we need. The following Tokenize() function uses VBScript’s regex engine to do the heavy lifting:
Granted, the regex matching pattern looks intimidating, but it is actually extremely efficient. When given a source json string (and it does not matter if the string is pretty-printed, normal, or minified), it instantly (less than a millisecond) matches the bits we want and transforms the entire string into the tokenized array listing shown above.
So here is the full listing of our VBA JSON Parser including support functions:
This code makes no attempt to verify if the source json is valid. It will raise a runtime error if the json is invalid. But virtually all json data I work with is valid from the get go.
This code also does not attempt to convert values into data types. So that no false assumptions are introduced, all values remain string values. The data types can be converted by your code easily if you need or want that.
In a similar vein, all escaped characters are untouched.
Also, the code does not attempt to build complex objects in VBA. Instead every single full path including the final key is stored AS TEXT in the dictionary as a dictionary key to the actual data item. This can simplify programmatic access to the data… a lot.
And finally, this code just parses JSON. There are no routines included here to create a json string from your data. This article is about reading json data from VBA.
You can parse (or deconstruct) the contents of a column with text strings that contain JSON or XML. To help illustrate how to use the Parse command, the following sample data starts with JSON and XML data entered as text fields.
For more information on importing JSON and XML, see Import data from external data sources.
JavaScript Object Notation (JSON) is a common data format, and you can import it into Excel.
To transform the SalesPerson column from text strings to a structured Record column:
Select the SalesPerson column.
Select Transform > Parse > JSON.
Select Record to see the values.
Select the Expand icon next to the SalesPerson column header. From the Expand columns dialog box, select only the FirstName and LastName fields.
Extensible Markup Language (XML) is a common markup and data format, and you can import it into Excel.
To transform the Country column from text strings to Table values.
Select the Country column.
Select Transform > Parse > XML.
Select Table to see the values.
Select the Expand icon next to the Country column header. From the Expand columns dialog box, select only the Country and Division fields.
Tip With a Table, you can also select Aggregate to aggregate data in different ways, such as Sum or Count. For more information, see Aggregate data from a column.
Читайте также: