Entries (RSS)  |  Comments (RSS)

Memcached and MySQL – What good is it?

I posted this in response to a post on GigaOM, but it was such a long comment, I felt that it was worthy as a post on it’s own.

The workloads of social networking sites fall mostly into the ‘read lots, write once’ class (most of the web exists within this paradigm.) Regardless of the database company that’s responsible for the software, the main idea in scaling this read heavy workload is to remove the burden from the database and move it to distributed memory stores.

As an engineer, you want applications to pull from the same cache pool to reduce I/O pressure. To ensure that every machine isn’t replicating data in individual caches, you have to go distributed. That’s the win with memcached.

Putting a distributed cache between the application and the database increases performance and shares data across your application servers, something that the database cannot do on it’s own. The database has on-disk and in memory caching, but eventually you’ll run out of memory on a single host if your working set exceeds the host’s memory.

Memcached also covers up replication lag (MySQL is terrible at replication, Oracle not so much) in large environments by putting data into the distributed cache (Write-through caching) before the slave database has finished it’s writing. Data is available immediately to clients, before the replication has completed.

It will also provide a large amount of savings when you’re constantly executing that O(n x m) query to find out who is friends with whom on your social networking site.

This comes with a cost, though. Relational database functions, like joining across large data sets, and atomic operations, become very difficult to execute. Memcached becomes the central server, and there is always a fear that an important key will drop out of cache because of a random eviction.

It’s not without risk, either. Dependence on the cache can hurt you severely if lots of memcached servers fail (and they do fail), Leaving you in a ‘cold cache’ situation where it can take hours to repopulate your working set back into the cache pool.

Don’t question MySQL’s performance — relational databases are great, but they are not the only solution to storage problems. the two problems that are being solved here are, highly orthogonal.

I’d also like to state that the majority of alternate key-value store databases listed in Richard Jones’ article and in Lenoard Lin’s blog are really not ready for high production loads (with maybe the exception of Tokyo Cabinet, HDFS, and Cassandra). There is still a ton of ’secret sauce’ the large sites are keeping quiet about in order to make these into effective data stores.

Lin states this in his review as well: “Your comfort-level running in prod may vary, but for most sane people, I doubt you’d want to.”

Tread lightly.

Posted by John Adams on May 17th, 2009

Read Full Post  |  Comments

Announcing mod_memcache_block

I’m announcing the release of mod_memcache_block, a distributed IP blocking system for Apache, with rate limiting based on HTTP request code.

For many years I’ve had a need for a module like this — A distributed blocking system which could operate across large web serving clusters and register hits in a central store. With rate limiting, incrementing counters on a single host is fairly useless when you have hundreds of servers behind a load balancer.

An attacker could hit many machines within the limit period before being detected, because there would be no central count. By keeping the counts in a memcache pool, all servers share the same data.

It won’t defend against attacks coming from random proxy addresses (say, Tor), and might unfairly count hundreds of users who live behind a single proxy (like corporate NAT), but it offers some protection against attacks coming from a single source IP.

The software is released under the Apache 2.0 Open Source License.

From the docs:

mod_memcache_block is an Apache module that allows you to block access to your servers using a block list stored in memcache. It also offers distributed rate limiting based on HTTP response code.

FEATURES

Distributed White and Black listing of IPs, ranges, and CIDR blocks
Configurable timeouts, memcache server listings
Support for continuous hasing using libmemcached’s Ketama
Windowded Rate limiting based on Response code (to block brute-force dictionary attacks against .htpasswd, for example)

REQUIREMENTS

libmemcached-0.25 or better
Memcached server
Apache 2.x (tested with 2.2.11)

Source code is available here:
http://github.com/netik/mod_memcache_block

If you would like to work on mod_memcache_block, contact me with your GitHub username and I’ll give you commit access on github.

Posted by John Adams on May 7th, 2009

Read Full Post  |  Comments

Velocity Preview

There’s a small interview with me in today’s O’Reilly radar, where I talk about some of the things that I’ll be presenting as part of my Velocity 2009 talk. You can listen to, and read the transcript here:

Posted by John Adams on May 7th, 2009

Read Full Post  |  Comments

Using GPS to enhance social networking

A bit of last-minute news, but I’ll be on a panel at SXSW Interactive: “Using GPS & Location to Enhance Social Networking”.

First there were social networks, and then there were location-based social networks, and now GPS and navigation-enhanced mobile social networks. This panel will explore how these emerging platforms integrate with existing social networks (facebook, twitter, etc), leverage GPS navigation functionality, and take location-aware social networking to the next level.

It’s at 5pm on Tuesday, the 17th of March.

More details at available on the panel’s schedule page.

Posted by John Adams on February 13th, 2009

Read Full Post  |  Comments

Twitter in New York magazine

US Airways Flight 1549 Plane Crash Hudson in N...
Image by davidwatts1978 via Flickr

Normally I don’t re-post Twitter articles here but this one on the New York magazine was wistful, fair, balanced, and gave a good representation of what it’s like to work here.

The reporter was in the office on the very day the US Airways flight crashed into the Hudson, and he recorded our (completely boring) reactions to the event.


Sure, the Twitter guys still have no idea how to make money off their fabulous invention. But for now they are living in a dreamworld of infinite possibilities, maybe the last one on Earth.

How Tweet it Is – New York Magazine

Reblog this post [with Zemanta]

Posted by John Adams on February 9th, 2009

Read Full Post  |  Comments

Find all the virtual hosts on a single IP

A little diagram of an IP address (IPv4)
Image via Wikipedia

Since people started using virtual hosts by name with Apache HTTPD and other web servers, it has become very difficult to figure out which virtual hosts live on a single IP, if all you have is the IP address.

Have a look at the Robtex Internet Swiss Army Knife. It solves this problem, and far more, including AS# lookups, BGP dereferencing, and DNS checks. There’s a firefox search toolbar available for the site (very useful!) and RBL (blacklist) check tools right on the main page of the site.

Reblog this post [with Zemanta]

Posted by John Adams on February 2nd, 2009

Read Full Post  |  Comments

Finding usernames through iTunes DAAP

Often on our local network, someone will start using up all of our outbound Internet bandwidth, and this leads to the network administrator’s dilemma:

How do we find the user in question so we can thump them on the head to make them stop?

This is a basic exercise in information gathering. For the most part, we’ll have the user’s IP address, and we’re a mac shop with many users running iTunes. If the user is sharing their library, you can use iTunes as a covert means of determining a user’s name, as iTunes will use the local computer’s name as the library name.

Telnet to the machines DAAP port, and issue:


John-adamss-macbook-pro:~ jna$ telnet x.x.x.x 3689
Trying x.x.x.x...
Connected to x.x.x.x.
Escape character is '^]'.
GET /server-info HTTP/1.1
Host: x.x.x.x
Client-DAAP-Version: 3.7
User-Agent: iTunes/8.0.2 (Macintosh; N; Intel)
Accept-Language: en-us, en;q=0.50

HTTP/1.1 200 OK
Date: Tue, 13 Jan 2009 21:26:38 GMT
DAAP-Server: iTunes/8.0.2 (Mac OS X)
Content-Type: application/x-dmap-tagged
Content-Length: 280

msrvmstt?mproaproaeSVaeFPatedmsedmsmlmsmOk?[minmUSER NAME’s LibrarymslrmstmsalmsasmsupmspimsexmsbrmsqymsixmsrsmsdcmstcImmsto???

Other options for this include attempting to sign on to the server with Apple-K if AFP on TCP port 548 is active (which will reveal the computer’s name) and using nmap with service detection to glean information about the host.

Posted by John Adams on January 13th, 2009

Read Full Post  |  Comments

Netgear fixes WGR3500 bandwidth issues, somewhat.

On this page, Netgear releases Firmware version 1.0.30 for the WNR3500 router. 

In my previous Apple Macbook Pro to Local network host (Mac Mini) testing, my top connection speed was around 2.4 Mbps. After the upgrade, it’s between 4.65Mbps and 7.5Mbps. Nothing near the promised speeds of 802.11N (300Mbit/sec), but I suspect that this is because of an incompatibility between Apple’s hardware and Netgear’s Hardware.

------------------------------------------------------------
Client connecting to 10.1.1.15, TCP port 5001
TCP window size:   129 KByte (default)
------------------------------------------------------------
[  3] local 10.1.1.70 port 51617 connected with 10.1.1.15 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  8.59 MBytes  7.19 Mbits/sec

Update:

After disassociating and reassociating with the AP, speeds went way up:

retina:~ jna$ iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 64.0 KByte (default)
------------------------------------------------------------
[  4] local 10.1.1.15 port 5001 connected with 10.1.1.70 port 52865
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  34.8 MBytes  29.1 Mbits/sec

With other devices on the WLAN, speeds go down. My current theory is that 802.11g devices on the same wireless network (such as the older Macbooks that we have) drag 802.11n speeds down, but I’m yet to be able to prove that.

Posted by John Adams on January 10th, 2009

Read Full Post  |  Comments

MacWorld 2009 is Upon Us

SAN FRANCISCO - JANUARY 15:  (FILE PHOTO) Appl...
Image by Getty Images via Daylife

Macworld 2009 is this week in my home town of San Francisco, and while we won’t have Jobs’ famous Keynote speech nor the participation of Apple this year, we will have the usual barrage of Macworld Parties.

A great list of events and parties is right here.

My week looks a bit like this:

Reblog this post [with Zemanta]

Posted by John Adams on January 5th, 2009

Read Full Post  |  Comments

Photocasting to iPhoto with Ruby

Today we’re going to teach you how to deal with having too many computers. Moving media around is a living hell because iPhoto and iTunes assume that you only ever possess one library. Sure, you can play music and movies purchased in the store on multiple machines, but what about your own library? How do you use that on multiple machines without moving things around?

At home I have a number of Macs, with one large machine (~1.5TB disk, 4 GB RAM) dedicated to digital photo editing. This machine houses a large volume of photos in it’s “Final Exports” folder. It’s not my main computer – my main computer is a MacBook Pro which travels with me nearly everywhere, and when I don’t have that, I have my iPhone.

I want my photos with me everywhere (or, at least, the last few hundred of them) so I can show people the last great event I went to, or that thing in the club that time. Here’s my solution.

1) Keep the photos on the large machine, where I edit photos in Adobe Lightroom and export them to the “Final Exports” folder.

2) Keep the laptop as the primary sync machine for the iPhone

3) Sync the iphone to the laptop, and retrieve the latest photos.

iPhoto 7 has a wonderful feature called Photocasting which will read lists of latest photos from the Internet (say, flickr, for example.) using a format that is very similar to RSS, but completely not compliant with current RSS standards.

The following Ruby script, and ERB template will turn a directory of directories into a pubsub feed for iphoto. You save your files in this form:

Final_Exports/dir1

Final_Exports/dir1/1.jpg

Final_Exports/dir1/2.jpg (and so on…)

Final_Exports/dir2

Final_Exports/dir…

Final_Exports/dirN (and so on…)

I use the scripts to generate RSS, and then put the RSS file somewhere on the Internet (the same directory with the photos works well, as my machines are internet accessible.) Running the script from cron once a day and syncing the phone, keeps you up to date.

Scripts:

makeiphotorss.rb

makeiphotorss.erb

Posted by John Adams on December 31st, 2008

Read Full Post  |  Comments