Video embedded in a Flash player prevents easy extraction of the video file. This post describes how to discover and extract video from a Brightcove Flash player with rtmpdump.
In the past, the flv files were easily recovered by interrogating the cache. This method relies on the browser logging HTTP network activity, but recently Flash players have switched to RTMP, a streaming protocol, which is not logged by the browser.
If the Flash player has a fallback for mobile devices then it is rather easy to trick the player into HTML5 mode and recover an mp4 file.
I intended to extract a video from AMC’s website, which did not have an HTML5 fallback, so I had to recover the stream used by Flash player.
The first step was to examine the network activity for clues about the video source. Brightcove makes this easy because their logging activity includes information about the video.
I opened Chrome’s network activity panel and searched for any requests containing “mp4″, a common format for video on the web. This reveals a 1×1 tracking pixel with a bunch of logging parameters,
http://ma156-r.analytics.edgesuite.net/9.gif?a=S~b=9c1e67f75e9bf1686~c=D2C517F3C2DC4F16AF5BE404B5127F4DD9F3E19B~d=EA2FDD29066B9E06BFA83A8F42BF1A84C16C5DFB~e=1~f=R~g=0~h=1.2~i=1.1~k=89AEF5EBD081F29F468FCD00F20201C817C78AD1~am=D~_ac=&mp4:rtmp_pd/196217268/196217268_2330789578001_AMC-MM-606-NextOn.mp4~ag=www.amctv.com~al=Mac%20OS~dx=1.679~en=Next%20on%20Mad%20Men:%20Episode%20606~pd=760214963001~_tt=Next%20on%20Mad%20Men:%20Episode%20606~_cd_1557=196217268~cm=Akamai~v=17~w=1486~aa=cp126124.edgefcs.net~ab=ondemand/&mp4:rtmp_pd/196217268/196217268_2330789578001_AMC-MM-606-NextOn.mp4~ad=1935~ae=rtmp~af=http://admin.brightcove.com/%5B%5BIMPORT%5D%5D/79423.analytics.edgekey.net/csma/brightcove/BrightcoveCSMALoader.swf~ai=Mozilla/5.0%20(Macintosh;%20Intel%20Mac%20OS%20X%2010_8_3)%20AppleWebKit/537.31%20(KHTML,%20like%20Gecko)%20Chrome/26.0.1410.65%20Safari/537.31~aj=11,7,700,169~ak=Flash_PlugIn~an=172~ao=1311~ap=1483~at=Netscape~au=832*455~_aw=rtmp://cp126124.edgefcs.net/ondemand/&mp4:rtmp_pd/196217268/196217268_2330789578001_AMC-MM-606-NextOn.mp4~ay=csma-3.0.27:brightcoveLoader-1.0.25~az=1.2~ba=900000~bb=23.59.190.199~va=1~_cd_1759=cp126124.edgefcs.net~mb=4845000~qoe=99.5068
One of these parameters looks like a URL,
rtmp://cp126124.edgefcs.net/ondemand/&mp4:rtmp_pd/196217268/196217268_2330789578001_AMC-MM-606-NextOn.mp4
So what is RTMP? And how do we get a video out of it? Googling for “edgefcs” led me to rtmpdump, a command-line tool for dumping RTMP streams to files. Usage is uncomplicated,
rtmpdump --rtmp rtmp://cp126124.edgefcs.net/ondemand --playpath mp4:rtmp_pd/196217268/196217268_2330789578001_AMC-MM-606-NextOn.mp4 -o 606.mp4
This command downloads the RTMP stream into a local file, “606.mp4″, which VLC can play.
I recently learned that python has a peculiar behavior when comparing tuples.
>>> () is () True >>> ((),) is ((),) False
The expression ((),) is a 1-tuple holding an empty tuple. Python knows that the empty tuple is the same as itself but not the same for the nested case. This is probably an implementation detail, but my curiosity was sparked.
In mathematical terms, it is as if we know 0 equals 0, but we not whether 1 equals 1. How odd!
I decided to explore this world, but I soon discovered I need to know a few more things. In computer programming, it’s customary to capture facts in terms of questions that we can answer.
I decided my program must know how to answer these questions,
- What is zero?
- Is something zero?
- What is something plus one?
- What is something less one?
This translates very easily into python code,
def zero():
"""What is zero?"""
return ()
def is_zero(arg):
"""Is something zero?"""
return arg is zero()
def next(arg):
"""What is something plus one?"""
return (arg,)
def prev(arg):
"""What is something less one?"""
if is_zero(arg):
raise Exception("Nothing is before zero.")
return arg[0]
This might not seem very interesting, but these four functions (or “axioms”) are extremely powerful. I have written programs that can add, subtract, multiply, compute primes, factorials, exponents, and more.
For example, here’s an addition function,
def add(left, right):
"Returns left + right"
while True:
if is_zero(right):
return left
left = next(left)
right = prev(right)
Writing programs in this style has been an interesting diversion for me. This number system is called “natural numbers” by some, and it’s usually a topic for school children, but I’m impressed by its simplicity and power. I hope to see what else can be accomplished, such as computing nth roots and working with negatives, fractions, and complex numbers.
This week I learned a little bit of Go. I was fascinated by the power and simplicity of goroutines and channels.
With these ideas fresh in my mind, I decided to reproduce Mark C. Chu-Carroll’s Go prime sieve in JavaScript with goroutines and channels as my toolset instead of conventional JavaScript idioms. Find my translation on GitHub.
I chose JavaScript because it shares many traits with Go, such as multiple threads of control, but doesn’t have a notion of channels.
Background: goroutines and channels
A goroutine is an independent concurrent thread of control.
A channel is a mechanism for two concurrently executing functions to communicate by passing a value. A channel has two methods, read and write, which are synchronous by default. That is, a write blocks until there is a read to consume it, and a read blocks until there is a write to consume.
JavaScript implementation
Goroutines are rather trivial to implement, a simple setTimeout will schedule an independent thread of control.
<code>function go (fn) {
setTimeout(fn, 0);
};</code>
Synchronous channels will require a queue of reader and writer callbacks. I decided to call the reader callbacks before the writers, but I don’t think it matters.
<code>var chan = function () {
this.readers = []; // [cb, ...]
this.writers = []; // [[value, cb], ...]
};
chan.prototype.read = function (cb) {
if (this.writers.length) {
// consume a writer
var writer = this.writers.shift();
cb(writer[0]);
writer[1](writer[0]);
} else {
// queue the reader
this.readers.push(cb);
}
};
chan.prototype.write = function (value, cb) {
if (this.readers.length) {
// consumer a reader
var reader = this.readers.shift();
reader(value);
cb(value);
} else {
// queue the writer
this.writers.push([value, cb]);
}
};</code>
I also pass the writer’s value to its callback because it could be useful.
The prime sieve
Mark’s Go sieve begins with an integer generator.
<code>func generate_integers() chan int {
ch := make(chan int);
go func(){
for i := 2; ; i++ {
ch <- i;
}
}();
return ch;
}</code>
The problem here is that the channel write (ch <- i) blocks inside of a loop. In JavaScript, we "block" by passing a callback that is called once the procedure can continue. Here, the producer function passes itself as the callback to write.
<code>function integers () {
var ch = new chan();
go(function () {
var producer = function (i) {
ch.write(i + 1, producer);
};
ch.write(2, producer);
});
return ch;
};</code>
This Go function excludes multiples of a given prime from the given channel.
<code>func filter_multiples(in chan int, prime int) chan int {
out := make(chan int);
go func() {
for {
if i := <- in; i % prime != 0 {
out <- i;
}
}
}();
return out;
}
</code>
We can rewrite this with another recursive callback, but this time we only have to block when we do a write.
<code>function filter_multiples (ch, prime) {
var out = new chan();
go(function () {
var consumer = function (i) {
if (i % prime != 0) {
out.write(i, function () {
ch.read(consumer);
});
} else {
ch.read(consumer);
}
}
ch.read(consumer);
});
return out;
};
</code>
The sieve in Go will chain a series of channels to exclude multiples of all the primes we have seen.
<code>func sieve() chan int {
out := make(chan int);
go func() {
ch := generate_integers();
for {
prime := <- ch;
out <- prime;
ch = filter_multiples(ch, prime);
}
}();
return out;
}</code>
We achieve the same in JavaScript with another recursive callback.
<code>function sieve () {
var out = new chan();
go(function () {
var ch = integers();
function iteration () {
ch.read(function (prime) {
out.write(prime, function () {
ch = filter_multiples(ch, prime);
iteration();
});
});
};
iteration();
});
return out;
};</code>
Mark's program simply reads from the sieve channel and prints out each prime number.
<code>func main() {
primes := sieve();
for {
fmt.Println(<-primes);
}
}</code>
Mine uses another recursive callback to do the same.
<code>function main () {
var primes = sieve();
function iteration() {
primes.read(function (i) {
sys.puts(i);
iteration();
});
};
iteration();
}
main();</code>
Demo
Go-flavored JavaScript: Prime Sieve
Discussion
Note that the call to main() returns (practically) immediately, since the goroutine in sieve(), which hasn't executed, has not yet written anything to the channel, so the read inside iteration just pushes the callback onto a read queue and returns. Once the main thread of control stops, the event loop begins running the goroutines, which drive the rest of the program. (Please correct me if I'm wrong.)
That functions return quickly is generally desirable, in JavaScript or Go, because a caller should not have to wait for a function to do some "hard work" before it returns. Instead, the hard work should occur in a separate thread of control and passed back to the caller by some layer of indirection.
JavaScript programmers typically solve this problem with callbacks which receive result of the hard work. The Go language offers synchronous channels between independent threads of control as a more sophisticated solution. These tools are easily ported to idiomatic JavaScript based on callbacks. Although unconventional, these tools are simple and may be very powerful when composed.
I hope this toy example borrowed from Go inspires JavaScript programmers to consider the unconventional.