Shangrila: November 2014

Wednesday, November 26, 2014

VMWare Player scare: Boot time error: The file specified is not a virtual disk

Documented here, the issue occurs due to mismatched settings between the virtual hard drive descriptor and data files. This was caused by a VMDK file deleted by VirtualBox (presumed) and mirculously retrieved using TOKIWA data recovery. The restored file was uncorrupted, but unrecognized by VMWare Player. The solution was to unregister the VM and register it anew. Blessed be the goddess.

Ugly hack - Elasticsearch plugin (Knapsack) install on Windows

Under Cygwin, with a mix of POSIX and Windows paths
Firewalled ES node -- downloaded to RDP client and pulled binary via tsclient
ES server is embedded in a proprietary stack

$ 'C:\Program Files\ObscureDistro\java/bin/java' -Xmx64m -Xms16m -Delasticsearch '-Des.path.home=/cygdrive/c/Program Files/ObscureDistro/server/bin' -cp 'C:\Program Files\ObscureDistro\server\lib.obscure\*' org.elasticsearch.plugins.PluginManager -install knapsack -url 'file:///\\tsclient\C\Users\someone\Downloads\elasticsearch-knapsack-1.3.2.0-plugin.zip'

-> Installing knapsack...
Trying file://///tsclient/C/Users/someone/Downloads/elasticsearch-knapsack-1.3.2.0-plugin.zip...
Downloading .....DONE
Installed knapsack into C:\cygdrive\c\Program Files\ObscureDistro\server\bin\plugins\knapsack

ES node must be restarted

Tuesday, November 25, 2014

Using strace

Thursday, November 13, 2014

Scala option canonical guide

http://tonymorris.github.io/blog//posts/scalaoption-cheat-sheet/

Scala anonymous functions

scala> val f:()=>Boolean = () => true
f: () => Boolean = <function0>

Tuesday, November 11, 2014

Elasticsearch - useful cats

http://myhost:9200/_cat/thread_pool?v

Shows:

host ip bulk.active bulk.queue bulk.rejected index.active index.queue index.rejected search.active search.queue search.rejected

Sunday, November 09, 2014

Scala notes

Solid, short video lessons: InfiniteSkills

Foldleft examples

Friday, November 07, 2014

Elasticsearch multilevel aggregation

/*
SELECT count(*)
FROM docs
GROUP BY storm_data_spout.task_id
UNION
SELECT count(*)
FROM docs
GROUP BY storm_data_bolt.task_id
*/

{
"query": {
"match_all": {}
},
"aggs": {
"bolt": {
"terms": {
"field": "storm_data_spout.task_id"
}
},
"spout": {
"terms": {
"field": "storm_data_bolt.task_id"
}
}
}
}

// ======================

/*
SELECT count(*)
FROM docs
GROUP BY storm_data_spout.task_id, storm_data_bolt.task_id
-- embedded agg not supported for multilevel using terms agg. Using script workaround per http://bit.ly/1uI76eO
*/

{
"query": {
"match_all": {}
},
"aggs": {
"spout-bolt": {
"terms": {
"script": "doc['storm_data_spout.task_id'].getValues() + '|' + doc['storm_data_bolt.task_id'].getValues()"
}
}
}
}

Thursday, November 06, 2014

rsync

Synchronize directory trees incrementally (only newer files get pushed)

rsync -vazh ~/git/myproj --exclude 'node_modules/' --exclude '.git/' --exclude '.idea/' myser@dev01:~/git

-a: archive (preserve timestamps/permissions)
-v: verbose
-h: human-readable output
-z: compress
-u: only new(er) files
--exclude: self-explanatory. Be sure to list a separate instances for every excluded path

It is important to include the trailing slash in the source path. That instructs rsync to copy the content of that directory into the destination path. Omitting the trailing slash will create the referenced directory in the destination (e.g ~/git/myproject/src/src/)