February 12, 2017

Joyent’s Manta. Store and Compute

Until now we only used Joyent’s Triton. We deployed Docker container with Triton. Probably we want to store date somewhere. Manta is a Amazon S3 like storage service. However, Manta also can compute on your data. Let’s start!

data store blog en
Figure 1. When you only can store, you have to transfer it every time.
manta data ocean blog en
Figure 2. Manta can compute, so no need to transfer data around.

Preparation

Manta does have a REST API. However, in this blog post we’ll just use the Manta command line utilities.

sudo npm install manta -g

Then we setup the Manta environment, so we copy from the Manta manual.

# https://my.joyent.com/main/#!/manta/intro Lists your's
export MANTA_URL=https://us-east.manta.joyent.com;
export MANTA_USER=<Joyent-User>;
export MANTA_KEY_ID=<ssh-public-key-finger-print>;
# Or read it directly from your local .ssh keys
export MANTA_KEY_ID=$(ssh-keygen -l -f $HOME/.ssh/id_rsa.pub | awk '{print $2}')

That’s all. We’re set up.

Let’s Store Some Files

First, let’s upload a few files.

echo "Hi there, internet" > ~/hello-manta.txt
# Putting up a simple file
mput -f ~/hello-manta.txt ~~/stor/hello-manta.txt

# Can get it again
mget ~~/stor/hello-manta.txt

# Can put any content, here piped beer data from stack exchange
curl -sL https://tools.ietf.org/rfc/rfc2616.txt | mput -p ~~/stor/blog/rfc2616.txt

# List directory
mls ~~/stor/blog/
# List more
mls -l ~~/stor/blog/

# Find stuff
mfind -n rfc*.txt ~~/stor/blog/

# Delete everything
mrm -r ~~/stor/blog/

To upload we use mput. The -f option will upload the specified file. The ~~/stor/hello-manta.txt specifies where to upload the file in Manta. The ~~ is like the Unix home directory, just on Manta. mput also can upload from a pipe. Here we upload a RFC, right from the curl output. The -p option creates the parent directories if missing. mls lists a Manta directory content and mfind finds files based on name. Finally, mrm deletes files and directories.

Manta’s Secret: Compute

First, let’s upload some videos.

curl -sL http://www.caminandes.com/download/01_llama_drama_1080p.zip | mput -p ~~/stor/blog/caminandes_01.zip
curl -sL http://www.caminandes.com/download/02_gran_dillama_1080p.zip | mput -p ~~/stor/blog/caminandes_02.zip
curl -sL http://www.caminandes.com/download/03_caminandes_llamigos_1080p.mp4 | mput -p ~~/stor/blog/caminandes_03.mp4

So, we stored the videos on Manta. Maybe we want to create mobile friendly versions of these videos. So we download the video, transcode the video and upload it again? If video is a few GBytes, we down and upload it again? No! Manta can compute, so we can convert the video right there, on Manta. Aha, so we have to learn a new framework? No! Manta just uses regular Unix programs. Let’s start!

# Login to you're stored file.
# After that, you get a regular prompt
mlogin ~~/stor/blog/caminandes_03.mp4

# Check out the enviroment
env
# The MANTA_INPUT_FILE will contain the actual file
# The MANTA_INPUT_OBJECT the name in the store
MANTA_OUTPUT_BASE=/Gamlor/jobs/f018b6a5-aa28-44fb-bc30-b5cca7a1db60/stor/Gamlor/stor/blog/caminandes_03.mp4.0.
MANTA_INPUT_FILE=/manta/Gamlor/stor/blog/caminandes_03.mp4
MANTA_INPUT_OBJECT=/Gamlor/stor/blog/caminandes_03.mp4

# Try out things you wish to do on the file. Like for a video, ffmpeg can transcode a video to smaller sizes.
# Like this one to make it way smaller, for like a phone
ffmpeg -i $MANTA_INPUT_FILE -strict -2 -b:v 500k -s 320x240 -vcodec mpeg4 -acodec aac ~/small.mp4

# Once you explored, you can exit
exit

We can login to the Manta file with mlogin. Really! This way we can run all Unix programs right next to the file and can compute anything right there. We want to create a smaller video, so we use ffmpeg. However, we want to transcode all videos. Yes, so let’s use mjob for that.

#mjob: Takes a list of Manta path. Applies the program on it. -o waits for the task and returns the stdout
echo ~~/stor/blog/caminandes_03.mp4 | mjob create -o -m 'sha1sum'
#=> added 1 input to 61c30a0c-c7f0-e09c-c995-8fc5d61b06c3
#=> ddd2bf01f87be76b875efafd46c9930f722113b5  -

#So, with mfind, we can now do calculations across files
mfind  ~~/stor/blog/ | mjob create -o -m 'sha1sum'
#=> added 3 inputs to fb3aa947-b637-c5ea-8d38-b42620cbef1d
#=> 07acf89b74b677432acc3ee6579bcbb8ee13640e  -
#=> 0e502cad377f75973c597eab2318f39aa4763ad4  -
#=> ddd2bf01f87be76b875efafd46c9930f722113b5  -

#Or more pretty listing.
# First we hash the data: sha1sum
# Then fetch extract the first colum. Note that we escape the dollar sign: awk "{print \$1}"
# Last, compose together the hash and the file name: echo $(cat) $(basename $MANTA_INPUT_OBJECT)
mfind ~~/stor/blog/ | mjob create -o -m 'sha1sum | awk "{print \$1}"| echo $(cat) $(basename $MANTA_INPUT_OBJECT)'
#=> added 3 inputs to 8bc9b130-9b73-47c5-dec9-ed4e87cdc761
#=> 07acf89b74b677432acc3ee6579bcbb8ee13640e caminandes_02.zip
#=> ddd2bf01f87be76b875efafd46c9930f722113b5 caminandes_03.mp4
#=> 0e502cad377f75973c597eab2318f39aa4763ad4 caminandes_01.zip



# For our movie tranformation, we first unzip the zip files.
# zip files are not streamable. So we ditch the stdin, an4d read from the file: unzip $MANTA_INPUT_FILE -d ~/out < /dev/null
# Then, get the output file name: tail -n 1
# Extract the actual file name column: awk '{print $2}'
# Push that file to stdout: xargs cat
# And use manta's mpipe to create a named Manta output file: mpipe ${MANTA_INPUT_OBJECT}.mp4
mfind -t o -n '.zip$' ~~/stor/blog/ | mjob create -w -m "unzip \$MANTA_INPUT_FILE -d ~/out < /dev/null \
| tail -n 1 | awk '{print \$2}' | xargs cat | mpipe \${MANTA_INPUT_OBJECT}.mp4"

# Last step. Transcode the videos to a mobile format:
# The downloaded video might not be streamable. So we cannot take it from the stdin. So, just read if from the file.
# First transcode it to a 600kbit/s stream, 320x240 resolution, mp4 format: ffmpeg -nostdin -i \$MANTA_INPUT_FILE -strict -2 -b:v 300k -s 320x240 -vcodec mpeg4 -acodec aac ~/tmp.mp4
# Cat the file, pipe it to a named Manta output file: cat ~/tmp.mp4 | mpipe \${MANTA_INPUT_OBJECT}.mobile.mp4
mfind -t o -n '.mp4$' ~~/stor/blog/ | mjob create -w -m "ffmpeg -nostdin -i \$MANTA_INPUT_FILE \
-strict -2 -b:v 600k -s 320x240 -vcodec mpeg4 -acodec aac ~/tmp.mp4 > /dev/null && cat ~/tmp.mp4 | mpipe \${MANTA_INPUT_OBJECT}.mobile.mp4"

# After completion:
mls ~~/stor/blog
#=> caminandes_01.zip
#=> caminandes_01.zip.mp4
#=> caminandes_01.zip.mp4.mobile.mp4
#=> caminandes_02.zip
#=> caminandes_02.zip.mp4
#=> caminandes_02.zip.mp4.mobile.mp4
#=> caminandes_03.mp4
#=> caminandes_03.mp4.mobile.mp4

# fetch a mobile video to check it out:
mget ~~/stor/blog/caminandes_03.mp4.mobile.mp4 > caminandes_03.mp4.mobile.mp4

mjob’ takes a Manta file path on stdin. `mjob create starts a new computation. -o will wait to completion and show the results. After the -m option we specify the computation

1st example: Compute ~~/stor/blog/caminandes_03.mp4’s sha1.

2nd example: Find all files with mfind and compute the sha1s.

3nd example: We can use Unix pipes. So, let’s create a easy to read output.

Let’s try some useful computation. Let’s unzip caminandes_01.zip andcaminandes_02.zip, then store it as with mpipe’ as another Manta file. `-w waits until the computation completed. And now let’s create the small, mobile friendly videos. We transcode with ffmpeg and store the small file with mpipe to a new Manta file. TATAAAA! Here are our smaller videos. We can check out a small file with `mget’. We did all this directly on Manta. If these file were large we didn’t have to download, then upload again anything. We did everything in Manta.

Map Reduce

So far we only used mjob create -m. mjob can do map reduce. When we need some summary type of computation, we use the map reduce feature. Here a example, were we calculate a summary of the used video bit rates:

# Let's list the bit rate:
# First find the bitrate: ffprobe  $MANTA_INPUT_FILE 2>&1
# Find the bitrate line: grep bitrate
mfind -n mp4$ ~~/stor/blog | mjob create -o -m 'ffprobe  $MANTA_INPUT_FILE 2>&1 | grep bitrate'
#=> added 6 inputs to 00abc11b-2bad-405c-8add-941400614cc4
#=> Duration: 00:02:26.05, start: 0.000000, bitrate: 6900 kb/s
#=> Duration: 00:02:30.13, start: 0.000000, bitrate: 10680 kb/s
#=> Duration: 00:02:30.12, start: 0.021333, bitrate: 717 kb/s
#=> Duration: 00:01:30.02, start: 0.023220, bitrate: 672 kb/s
#=> Duration: 00:02:26.08, start: 0.021333, bitrate: 725 kb/s
#=> Duration: 00:01:30.00, start: 0.000000, bitrate: 3120 kb/s

# Let's list the bit rate again:
# Only extract the bit rate colum: awk "{print \$6}"
mfind -n mp4$ ~~/stor/blog | mjob create -o -m 'ffprobe  $MANTA_INPUT_FILE 2>&1 | grep bitrate | awk "{print \$6}"'
#=> added 6 inputs to 6c1b8b80-1516-e8e5-f6b6-99c5ebcd9f3b
#=> 6900
#=> 672
#=> 10680
#=> 717
#=> 725
#=> 3120

# With the reduce phase we can collect the result's back together.
# For example, get the min, max and mean bit reate of all our videos
mfind -n mp4$ ~~/stor/blog | mjob create -o -m 'ffprobe  $MANTA_INPUT_FILE 2>&1 | grep bitrate | awk "{print \$6}"' \
-r 'maggr max,min,mean'

First, we extract the video’s info with mbjob and ffprobe. We look for the bitrat with grep and locate the right, 6th column with awk. Finally, we specify the reduct step after the -r parameter. Here we use maggr to do some statistics. (^_^)

Explore Manta

I skipped many features and topics. Take a look at the Manta documentation and try it out.

Tags: Joyent Manta