Wednesday, November 26, 2014

VMWare Player scare: Boot time error: The file specified is not a virtual disk

Documented here, the issue occurs due to mismatched settings between the virtual hard drive descriptor and data files. This was caused by a VMDK file deleted by VirtualBox (presumed) and mirculously retrieved using TOKIWA data recovery. The restored file was uncorrupted, but unrecognized by VMWare Player. The solution was to unregister the VM and register it anew. Blessed be the goddess.

Ugly hack - Elasticsearch plugin (Knapsack) install on Windows



  • Under Cygwin, with a mix of POSIX and Windows paths
  • Firewalled ES node -- downloaded to RDP client and pulled binary via tsclient
  • ES server is embedded in a proprietary stack


$ 'C:\Program Files\ObscureDistro\java/bin/java' -Xmx64m -Xms16m -Delasticsearch '-Des.path.home=/cygdrive/c/Program Files/ObscureDistro/server/bin' -cp  'C:\Program Files\ObscureDistro\server\lib.obscure\*' org.elasticsearch.plugins.PluginManager -install knapsack -url 'file:///\\tsclient\C\Users\someone\Downloads\elasticsearch-knapsack-1.3.2.0-plugin.zip'

-> Installing knapsack...
Trying file://///tsclient/C/Users/someone/Downloads/elasticsearch-knapsack-1.3.2.0-plugin.zip...
Downloading .....DONE
Installed knapsack into C:\cygdrive\c\Program Files\ObscureDistro\server\bin\plugins\knapsack



  • ES node must be restarted

Tuesday, November 25, 2014

Using strace


Trace system calls from a process and all its children and threads:

sudo strace -f -p 3914 2>&1 | grep -vE 'clock_gettime|SIGSTOP|gettime|epoll|futex|restart' | head -10000 | less

-f: track forked
-p: parent PID
grep: remove fast, unimportant calls

Thursday, November 13, 2014

Tuesday, November 11, 2014

Elasticsearch - useful cats


http://myhost:9200/_cat/thread_pool?v

Shows:

host ip bulk.active bulk.queue bulk.rejected index.active index.queue index.rejected search.active search.queue search.rejected 

Friday, November 07, 2014

Elasticsearch multilevel aggregation

/*
SELECT count(*)
FROM docs
GROUP BY storm_data_spout.task_id
UNION
SELECT count(*)
FROM docs
GROUP BY storm_data_bolt.task_id
*/

{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "bolt": {
      "terms": {
        "field": "storm_data_spout.task_id"
      }
    },
    "spout": {
      "terms": {
        "field": "storm_data_bolt.task_id"
      }
    }
  }
}

// ======================

/*
SELECT count(*)
FROM docs
GROUP BY storm_data_spout.task_id, storm_data_bolt.task_id
-- embedded agg not supported for multilevel using terms agg. Using script workaround per http://bit.ly/1uI76eO
*/

{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "spout-bolt": {
      "terms": {
        "script": "doc['storm_data_spout.task_id'].getValues() + '|' + doc['storm_data_bolt.task_id'].getValues()"
      }
    }
  }
}

Thursday, November 06, 2014

rsync


Synchronize directory trees incrementally (only newer files get pushed)

rsync -vazh ~/git/myproj --exclude 'node_modules/' --exclude '.git/' --exclude '.idea/' myser@dev01:~/git

-a: archive (preserve timestamps/permissions)
-v: verbose
-h: human-readable output
-z: compress
-u: only new(er) files
--exclude: self-explanatory. Be sure to list a separate instances for every excluded path

It is important to include the trailing slash in the source path. That instructs rsync to copy the content of that directory into the destination path. Omitting the trailing slash will create the referenced directory in the destination (e.g ~/git/myproject/src/src/)