Data Schema

This documents the different data schema's employed by SMUG to save and read processed data.

smug_reports

The reports collection holds the large scale configurations for the reports. These apply to a broad set of tweets and will contain data specific for a report.

Example

{ 
    "_id" : { 
        "$oid" : "5a1580d5c58878fd9524e954" 
    }, 
    "name" : "Word vectoring", 
    "enabled" : true, 
    "parameters" : [ "ziek", "griep", "verkouden", "verkoudheid", "koorts", "hoofdpijn" ] 
}

smug_messages

This collection holds data to a single specific message. It has 2 main fields

Metadata

Tracks data related to a tweet such as

  • Source
  • Dataset
  • Language
  • rated words
  • Links

Reports

Tracks the result of analyses on the dataset. This is an array of reports. These fields are not certain as report properties can be toggled at the report level. Thus a report can be vector-scored by not location analysed. All items in this dataset have a reference to the report id.

{ 
  "_id" : { 
    "$oid" : "5a1580dac58878fd9524e96b" 
  }, 
  "metadata" : { 
    "date" : { 
      "$date" : "2017-01-01T16:00:00.090+0000" 
    }, 
    "url" : { 
      "$numberLong" : "815573292006379520" 
     }, 
     "type" : "post", 
     "source" : "twitter", 
     "source_import" : "twinl", 
     "lang" : "nl_NL.UTF-8", 
     "message_words" : [ "oeps", "kleine", "wereld" ] 
   }, 
   "author" : "8077b14c8aa0c80f6d9bb615", 
   "message" : "@b89221b0706cf91a3e6d7532 oeps kleine wereld...", 
   "reports" : [ 
    { "id" : "5a1580d5c58878fd9524e954", "score" : 0.12757538982134464, "scored_words" : [ "oeps", "kleine", "wereld" ] } 
   ] 
  }