Posts

First take at backing up my Hive blog (part 3)

avatar of @helcim
25
@helcim
·
·
0 views
·
2 min read

(part 2)

At this point (provided you managed to download all your history) you could use your favourite language's json library to extract all the information you need (well, just the images' URLs to say the truth, with post slugs so you could know which images go where). But why not make things more difficult ('cause it's such a trivial task) and use a tool not fit for the purpose at all. Let's try to do it in bash!
Just joking, as a lazy programmer I am actually doing it the easy way ;) On my system (Ubuntu 20.10) there's a nice little tool called jq. As you could expect it queries json and outputs value of any field you request. So now, when you got the tool, you have to figure out the path to your field. Hint: transaction data are second element of a list.

So, to get the URL of the image used in transaction 277 I have to look for it in json_metadata field like this (my.json is of course the file containig account history):

jq .result.history[276][1].op.value.json_metadata my.json 

The output, however, is not quite ready for further use:

"{\"tags\": 
   [\"polish\", 
    \"portrety\", 
    \"sztukapolska\", 
    \"art\"], 
    \"image\":  
       [\"https://cdn.steemitimages.com/DQmekNTjo4dg1ZYEUZT89cvNUPgHA8rYJqqro169sc6Mu73/lampijanchrzcicieldamaz.jpg\", 
        \"https://cdn.steemitimages.com/DQmc4ADR5v6i9QGoy6CYda7xm6oGXjvte9tuabaWEXg7jmG/katarz.jpg\"], 
    \"app\":\"steemit/0.1\", 
    \"format\":\"markdown\" 
}" 
 

First, for a mysterious reason it is escaped. Second, it is structured as list of tags. You can see that image tag contains not image's URL, but a list of URLs. I don't know why it's not called "images" instead, forget it.

At this point you probably already start to wonder whether to use awk or sed to unescape this tiny json snippet. Please, don't! It's as easy as using an internal fromjason jq function.

jq '.result.history[276][1].op.value.json_metadata|fromjson' dump.json 

Now, you get proper json:

{ 
  "tags": [ 
    "polish", 
    "portrety", 
    "sztukapolska", 
    "art" 
  ], 
  "image": [ 
    "https://cdn.steemitimages.com/DQmekNTjo4dg1ZYEUZT89cvNUPgHA8rYJqqro169sc6Mu73/lampijanchrzcicieldamaz.jpg", 
    "https://cdn.steemitimages.com/DQmc4ADR5v6i9QGoy6CYda7xm6oGXjvte9tuabaWEXg7jmG/katarz.jpg" 
  ], 
  "app": "steemit/0.1", 
  "format": "markdown" 
} 
 

Now it's easy-peasy. Just pipe it to another jq invocation:

jq '.result.history[276][1].op.value.json_metadata|fromjson' dump.json | jq  .image[] 

which gives

"https://cdn.steemitimages.com/DQmekNTjo4dg1ZYEUZT89cvNUPgHA8rYJqqro169sc6Mu73/lampijanchrzcicieldamaz.jpg" 
"https://cdn.steemitimages.com/DQmc4ADR5v6i9QGoy6CYda7xm6oGXjvte9tuabaWEXg7jmG/katarz.jpg" 

If you wonder why "[]" is appended to ".image", it's because it is jq's syntax for outputting all the list elements as strings and not as a json list. This will come handy when we will try to download these images. You can check, they are really there. "https://cdn.steemitimages.com/DQmekNTjo4dg1ZYEUZT89cvNUPgHA8rYJqqro169sc6Mu73/lampijanchrzcicieldamaz.jpg" "https://cdn.steemitimages.com/DQmc4ADR5v6i9QGoy6CYda7xm6oGXjvte9tuabaWEXg7jmG/katarz.jpg"

(to be continued…)

Posted Using LeoFinance Beta