Archive for LinkRiver

Find Similar Links on LinkRiver

I've been noodling on this feature for a while -- how can I find "more links like this one" in LinkRiver. Putting on my machine learning hat, I contemplated link-to-link co-visitation schemes, semantic indexing, various clustering algorithms... but all approaches were too data-heavy, at least for now. There had to be an easier way...

LinkRiver has allowed full-text searching links (by river and stream) for a while now. The link title and host (i.e. www.techcrunch.com) are both a part of the index. Could the full-text search engine help out here? Let's try it out.

One popular link today was a story on news.com about the possibility of eBay selling Skype to Google. What if I send the link host and title to the search engine? Are the results relevant?

Try it yourself: Click to see similar links

In most cases this works really well...

Twobile-Twitter for Windows Mobile
FriendFeed Has Search

But sometimes, the results are not so great:

TechMeme Leaderboard: Six Months In

Options - one thing I may do, depending on feedback, is stop including the link host as a part of the search query. Play around (click similar, then re-run the search after removing the link host from the search box) and let me know what you think.

Comments (3)

What Powers the Aggregators?

All lifestream and link-sharing aggregators use an RSS/ATOM parser to help power their service.

I built LinkRiver using Ruby on Rails and would have preferred to use a parser built in Ruby. However, Mark Pilgrim's Universal Feed Parser is rock-solid and very well tested, so I use UFP for feed parsing. LinkRiver controls UFP via a memcached-based message queue. Some UFP-Python glue posts new shared links via a simple HTTP API.

A while back RSSMeme's Benjamin Golub tweeted that he also uses UFP, so I thought I'd ask around to see what some of the other aggregators are using.

Bret Taylor from FriendFeed told me they use UFP as a fall-back but rely primarily on a custom parser that uses much less memory.

ReadBurner developer Alexander Marktl replied to say that he uses a MagicParser, a commercial parser for PHP.

After testing a bunch of options and finding none that worked, Tumblr's Marco Arment wrote his own parser for PHP "with regular DOM functions".

Google's Chris Wetherell has blogged about the history of Google Reader and mentioned that UFP was involved, at least in the early stages.

Any others?

Updated: See comments -- Gabe Rivera from Techmeme built his own in Perl.

Comments (4)

Save Links for Later on LinkRiver

This happens to me all the time. I'm in super-productive mode and I run across an article or blog post that is interesting but entirely outside the context of what I'm doing. I need to stay on task - no tangents allowed.

I've tried a few things... a 'To Read' folder in my browser's bookmarks or tagging links 'toread' on del.icio.us, but these methods were either too disruptive or difficult to manage.

I tried out InstaPaper the other day and loved it - one-click and a link is saved for later. It worked great, but it didn't help me if I found something to 'later' when in Google Reader. Still too much friction.

Inspired by InstaPaper, I added a 'Save for Later' feature to LinkRiver.

Big

Links you mark 'Later' show up under your 'Later' tab in LinkRiver. These links are private and not shared with your followers unless you choose that explicitly.

Bookmarklets

Three Ways to Save Links for Later

There are three ways to add links to your 'Later' stream.

First - there is a new one-click bookmarklet you can add to your browser toolbar. One-click -- boom -- you've saved the link for later without leaving the page you are on. Look for these in your sidebar after logging in to LR.

Later Link

Second - links inside LinkRiver now have a 'later' option in addition to the 'share' option that's been there for a while. Again - one-click and its saved for later.

Third - this one is probably the most powerful of them all - you can import an external feed into your 'Later' stream.

Big

I setup LinkRiver to import my Google Reader shared items into my main stream and my starred items into my Later stream. This works beautifully, especially when using Google Reader on my iPhone. Just click 'share' in GR to share on LinkRiver, or 'star' to save it for later. Sweet GTD goodness!

Comments (3)

Nice to be Noticed

Google Reader creator Chris Wetherell is writing a great series on the birth of Google Reader. In the latest, Chris mentions LinkRiver and others when he talks about "services aggregating shared items".

Gotta say I'm honored. That's kind of like UCLA basketball coach Ben Howland mentioning me, a church-league pee-wee basketball coach, in a post-game news conference.

Comments (2)

LinkRiver Adds Attention Data

Updated: I added support for APML (attention profile markup language) for your attention data. Mine is here.

Adam’s Attention on LinkRiver

Above is a screenshot of a new feature I've been playing around with on LinkRiver: attention data. Clicking on a user's "attention" tab will show the top sites and the top keywords from links shared by that user. Click through about to see my attention data -- I'm interested in Ruby, Barack Obama, MySQL, Nginx, Twitter, the iPhone, etc. My friend Chris, a chemist for biofuel startup PrimaFuel, has different interests: energy, solar, Barack Obama, and energy. What do you think?

Comments (8)

Inside the LinkRiver Favicon Server - Ruby + Nginx + Thin + Rack

Favicons on LinkRiver

LinkRiver displays favicons next to most links to help users recognize link targets. Those favicons are served separately from the main LinkRiver server. This post describes some of the design decisions and approaches I took when building the FI server.

My most important requirement for the favicon server (FI) was that it be loosely coupled to the LinkRiver (LR) server and reusable for other applications. The LR server could link to a favicon for *any* page without worrying about whether the icon exists on the FI server. If the FI server already had the icon, great - it would serve it up. If not, it would send back a default icon. This requirement ruled out Amazon's S3 service because it won't allow you to return a default image/page in response to "404 Not Found" errors.

When LR wants to display the favicon for a site like Twitter, it generates a URL like this:

http://favicons.linkriver.com/f1/25/twitter.com.ico

LR knows how to "map" host names to the directory structure (f1/25 in the example above). Keeping icons in a two-tiered directory system likes this makes it easier to manage the large number of cached files (its bad to have zillions of files in one directory). It also serves as a minor obstacle to others hotlinking to these favicons.

Behind the scenes it would work like this. A fast/lightweight web server like lighttpd or Nginx would sit in front of all requests to serve already-cached static files. When an uncached icon is requested, the FI server queues it up for later download. I have a lightweight non-persistent message queuing class built on memcached and Ruby that would be perfect for this. All the FI server has to do push the request values onto memcached and then tell the web server to send back the default icon.

First Attempt -- PHP via Lighttpd and FastCGI

LinkRiver is written in Ruby on Rails using Nginx as a load balancer and static page server with mongrel as the Ruby app server. I love working in Ruby, but for this app, rails would have been overkill. I wasn't familiar with ways to run Ruby using a faster/lighter server so I dusted off my trusty/rusty PHP skills. Remember - the only thing PHP had to do was push request values to memcached and tell the web server to return the default icon. Something like this:

< ?php
Header('HTTP/1.1 200 OK');
Header("Content-Type: image/x-icon");
Header("X-LIGHTTPD-Send-File: /path/to/default.png");

//
// A few more lines to push the request
// values onto memcached
//
?>

X-LIGHTTPD-Send-File header tells lighttpd to return a static file to the browser -- this is much faster than having PHP do it. I banged this out in about an hour and it worked great.

Second Attempt -- Ruby Via Nginx and Thin/Rack

My PHP+Lighttpd version of the FI server worked just fine but I didn't like supporting both Nginx and lighttpd. I also prefer coding in Ruby whenever possible. Was there a lightweight way to run Ruby on a web server? That's where Thin and Rack come in.

Thin is a wicked-fast Ruby web server that's perfect for what I was trying to do -- run a fairly simple Ruby script on a web server. Thin is the web server itself - Rack is an interface that defines how Ruby interacts with the server.

Thin runs a Rack config file that looks something like this:

require 'favicon'
require 'mcqueue'
q = MCQueue.new(QUEUE_SERVER, QUEUE_NAMESPACE)
map '/' do
  run FaviconAdapter.new(q)
end

For all requests that make it to Thin (remember - all cached icons are served by Nginx directly and never reach Thin), Thin creates an instance of my FaviconAdapter class and "runs" it, which means it will call the FaviconAdapter's "call" method and pass in information about the request. Our call method parses out some request information (the hostname for the favicon), pushes it to memcached, and returns an HTTP status code, headers and body, just like the PHP version.

require 'rubygems'
require 'thin'

DEFAULT_HEADERS = {
  'Content-Type' => 'image/x-icon',
  'X-Accel-Redirect' => '/protected/default.png'
}

class FaviconAdapter
  def initialize(queue)
    @queue = queue
  end

  def call(env)
    req = Rack::Request.new(env)
    //
    // A couple of lines removed to parse the request and
    // push it to memcached...
    //
    [200, DEFAULT_HEADERS, ['']]
  end
end

The X-Accel-Redirect does the same thing for Nginx that the X-LIGHTTPD-Send-File header does for lighttpd: it tells the web server to return the file directly instead of streaming it through our Ruby or PHP code.

The new Ruby FI version has been solid and stable like the PHP version before it. The new version should scale better too -- in my tests, Nginx handles high load better and serves static files at the same high speed at lighttpd. My Ruby code is outperforming the PHP code by about 30%, but that's not quite a fair comparison. The Ruby version caches its connection to memcached while the PHP version must reconnect for each request.

That's all for now.

Comments (2)

LinkRiver Deep Search

The other day my friend Chris shared a cool link on "Barack Obama's surprisingly non-ideological policy shop". I follow Chris on LinkRiver, so that link ended up in my river. I read it and shared it, so now the link shows up in my stream and in the the river of those who follow me.

Fast forward a few days and I want to find that link again. Unfortunately, the link title was not very descriptive ("The Audacity of Data") so I couldn't use LR's standard stream or river search to track it down. I could have scrolled through a couple of pages of my stream... but what if I could search the text of the linked story instead of just the title?

Enter LinkRiver deep search. The name's not great and will probably change, but deep search is a Google Custom Search Engine that makes it easy to search the content of all pages in a user's stream.

Two caveats. Your first deep search may be a little slow because Google has to grab an XML file from us and build your index before it can return results. Second - I've limited this feature to streams (no rivers yet) for now so that the XML file Google has to pull is a reasonable size. I'll probably add rivers later.

Back to the story. The article I wanted to find was about "obama" and mentioned "policy" and "wonk". See the results yourself.

Comments off

LinkRiver Leaderboard

One of these days I will implement a per-stream, per-river leaderboard for LinkRiver showing the top shared links sources. Here are the overall leaders right now:

  1. www.techmeme.com
  2. mashable.com
  3. twitter.com
  4. www.techcrunch.com
  5. feeds.feedburner.com
  6. www.readwriteweb.com
  7. www.nytimes.com
  8. valleywag.com
  9. www.alleyinsider.com
  10. lifehacker.com
  11. www.mahalo.com
  12. www.louisgray.com
  13. venturebeat.com
  14. www.flickr.com
  15. ajaxian.com
  16. www.engadget.com
  17. www.boingboing.net
  18. blogs.zdnet.com
  19. www.crunchgear.com
  20. gigaom.com
  21. www.facebook.com
  22. www.youtube.com
  23. www.news.com
  24. andrewsullivan.theatlantic.com
  25. rss.cnn.com
  26. Most are what you'd expect, but a few stand out. The twitter.com links generally represent people experimenting with importing their own Twitter feeds into their stream. The feeds.feedburner.com and rss.cnn.com sources represent redirecting links... not the actual destination but an intermediary designed to count clicks. techmeme.com scores the highest because we had a user that tracked the TechMeme firehose feed for a while and generated a ton of links very quickly.

Comments off

No Conditional GET for Google Reader, StumbleUpon

Conditional HTTP GETs save bandwidth and computing time by allowing RSS readers to only download feeds when they've changed. Most bookmarking/sharing services support this feature (del.icio.us, Ma.gnolia, FeedBurner, NewsGator, etc) but I was surprised to find the neither Google Reader nor StumbleUpon do. This means that every time that LinkRiver checks a Google Reader shared items or StumbleUpon feed for updates, it has to download the entire feed and look for new stuff. Argh :-(

Comments off

LinkRiver Seeded With A-List Bloggers

Brittney noticed that LinkRiver was seeded with A-list blogger accounts.

So, it seems Scoble didn't sign up for the account himself, instead he was "seeded," along with other high-profile Silicon Valley bloggers. This is an interesting marketing move to make.

LinkRiver is a classic scratch-your-own-itch project. There was no good way to aggregate the link blogs published by more and more of the bloggers I read, so I built an app to do it. I added Scoble to the system (Scoble's stream) because I wanted to follow him, same with many others (see the right side of my river).

Seeding these accounts was not intended to be a "clever marketing move" as much as a practical "I want to follow these people" move. I've already had a number of bloggers "claim" their accounts. If it bothers people - I'll gladly remove them.

Comments

« Previous entries