mjl > projects > pylyrics

Warning

As noted in the news section, this python version has been superseded by a version in limbo, for inferno.

Apology

Since I do not feel like making a website, you'll only find the README from the distribution here. Latest version: pylyrics-17.tgz.

Try

Try an installed version of pylyrics: Live pylyrics.

News

1-11-2007 - pylyrics has been superseded by a lyricd implementation for inferno
This version is no longer maintained. Actually, I have been running the inferno lyricd for the past few months.
23-04-2007 - pylyrics-17.tgz
Fix lyrc, they changed their html. Fix quotes and spurious html in output, no langer html-escaped (e.g. &), my bad.
08-04-2007 - pylyrics-16.tgz
Fix sing365 and lyricsdownload, they changed their site. Slight clarification in protocol, for when an error to lyric get happens. Never return empty lyric and say retrieval was a success. There is also a new implementation of the scgi program, in limbo. It is not yet released.
15-03-2007 - pylyrics-15.tgz
Retrieve for lyricsdownload has been fixed. The lyrics are a bit more sanitized: no more leading and trailing whitespace. Possibly remaining html constructs are escaped. lyric.py search results print the command with address if it was specified for the search.
18-02-2007 - pylyrics-14.tgz
Search on google was broken, they slightly changed the format of the results. Also elyrics lyric retrieval and plyrics search have been fixed, both changed their html. For now, the test script does not work because of the caching in lyricd.
09-02-2007 - pylyrics-13.tgz
Add 'wrap="on"' to pre html tags, this makes very long lines (e.g. on the few sites that don't have linebreaks in their lyrics) wrap at the end of the browser window instead of widening the window to the full length of the text. Also, handling of PATH_INFO and SCRIPT_NAME has been changed now that the flup author has fixed flup's handling of it.
25-12-2006 - pylyrics-12.tgz
Protocol change (now at version pylyrics-12), search and fetch can now specify a list of sites to search at. The web-interface now has url's of the form /lyric/artist/title, optionally followed by a /site (indicating which site to search, multiple sites can be specified separated by a comma) which in turn can be optionally followed by /?url (where url specifies the location of the lyric in the specified site. Prefixed by a list of comma-separated sites followed by a slash, only these sites will be searched. The client, lyric.py, is changed accordingly. Fixed lyricsdownload, they changed their html. Some more minor fixes. Warning: the handling of PATH_INFO and SCRIPT_NAME might be different in lighttp than in other httpd's.
21-11-2006 - pylyrics-11.tgz
Fixed searching on google, they changed the html. Fixed lyricsdownload fetch, they changed their website to have it include javascript and style tags.
22-10-2006 - pylyrics-10.tgz
Removed googlesyndication tags that got into the output. Fixed test script to test whether your lyricd is working okay. Minor bug fixes.
26-09-2006 - pylyrics-9.tgz
Major overhaul: new client/daemon separation and web-interface implemented using an scgi daemon and lighttpd. This gave a major speed boost. Also fixed problems with some sites.
27-01-2006 - pylyrics-8.tgz
Release 8. Almost (if not) all sites were broken. They all work now. Also added some debugging stuff and changed the html output a bit.
22-12-2005
Mentioned and linked live running pylyrics, updated webpage a bit.
13-08-2005 - pylyrics-7.tgz
Fix searching on google (their output changed). Do not display duplicate results. Do not display title in lyrics for plyrics. Slightly less verbose output on cgi/lyrics.py
29-06-2005 - pylyrics-6.tgz
Three new sites, rare-lyrics.com, lyricsdownload.com, elyrics.net. Also slight cleaning and more generic code.
29-06-2005 - pylyrics-5.tgz
Rate results, smaller cgi/lyrics.py code, keep artist/title information from search results, more (and more to do still).
28-06-2005 - pylyrics-4.tgz
Fixed pylyrics.py, the command-line tool. New rpcd-htmlcache.py for caching pages (for easier testing). Probably more.
27-06-2005 - pylyrics-3.tgz
Threading on cgi-script makes it usable again. Ranking of results in darklyrics by BD.
27-06-2005 - pylyrics-2.tgz
Cleaner, smaller code, bugs fixed (thanks HvH and BD) and a new site: darklyrics.com (thanks to BD).
26-06-2005 - pylyrics-1.tgz
First release of pylyrics.

Safety

MD5 (pylyrics-17.tgz) = 878b0fe3ddfd7d8b32f4917327aa12ad
SHA1 (pylyrics-17.tgz) = 20a3c3fbe032c9643ca54d6f7dd34a57819317f4

README

INTRODUCTION

Pylyrics searches lyric-websites for lyrics, retrieves them and
presents them without all the annoying links/moving images/banners/flash
on them.

It consists of:
- a daemon communicating over tcp using a simple text-based protocol
- a library implementing the client-side of the protocol
- a client talking to the daemon over tcp (using the library)
- a web-interface implemented as a scgi daemon, combined with
  lighttpd, talking to the daemon over tcp

I usually have an online demo of the web-interface running on my
webserver.  Try it at:

	http://www.ueber.net/lyrics


DOWNLOAD

http://www.xs4all.nl/~mechiel/projects/pylyrics/files/pylyrics-17.tgz


NEWS / CHANGES

version 17:
	fix lyrc, they changed their html.  fix quotes and spurious html
	in output, no langer html-escaped (e.g. &), my bad.

version 16:
	fix sing365 and lyricsdownload, they changed their site.  slight
	clarification in protocol, for when an error to lyric get happens.
	never return empty lyric and say retrieval was a success.
	there is also a new implementation of the scgi program, in limbo.
	it is not yet released.

version 15:
	retrieve for lyricsdownload has been fixed.  the lyrics are a
	bit more sanitized: no more leading and trailing whitespace.
	possibly remaining html constructs are escaped.
	lyric.py search results print the command with address if it
	was specified for the search.

version 14:
	search on google was broken, they slightly changed the format of
	the results.  also elyrics lyric retrieval and plyrics search
	have been fixed, both changed their html.  for now, the test
	script does not work because of the caching in lyricd.

version 13:
	add 'wrap="on"' to pre html tags, this makes very long lines
	(e.g. on the few sites that don't have linebreaks in their
	lyrics) wrap at the end of the browser window instead of widening
	the window to the full length of the text.  also, handling of
	PATH_INFO and SCRIPT_NAME has been changed now that the flup
	author has fixed flup's handling of it.

version 12:
	protocol change (now at version pylyrics-12), search and fetch can
	now specify a list of sites to search at.  the web-interface now
	has url's of the form /lyric/artist/title, optionally followed by
	a /site (indicating which site to search, multiple sites can be
	specified separated by a comma) which in turn can be optionally
	followed by /?url (where url specifies the location of the lyric
	in the specified site.
	prefixed by a list of comma-separated sites followed by a slash,
	only these sites will be searched.  the client, lyric.py, is
	changed accordingly. fixed lyricsdownload, they changed their
	html.  some more minor fixes.
	warning: the handling of PATH_INFO and SCRIPT_NAME might be
	different in lighttp than in other httpd's.

version 11:
        fix searching on google, they changed the html.  fix
        lyricsdownload fetch, they changed their website to have
        it include javascript and style tags.

version 10:
        removed googlesyndication tags that got into the output.
        fixed test script to test whether your lyricd is working
        okay.  minor bug fixes.

version 9:
        major overhaul, new client/daemon separation and web-interface
        implemented using an scgi daemon and lighttpd.  this gave
        a major speed boost.  also fixed problems with some sites.


SITES

The following sites are searched for each query:
- sing365.com
- azlyrics.com
- plyrics.com
- lyrc.com.ar
- darklyrics.com
- rare-lyrics.com
- lyricsdownload.com
- elyrics.net

Searching is done either using the sites own search functions or
using google.

The output of the websites are easy to parse and the code for each
website is small.  Adding new sites means creating a new
lyricsites/$site.py module and adding a line of initialization to
pylyrics.py.


HOW TO USE

All components are python scripts, thus, python needs to be installed
when using this software.

The daemon is called lyricd.py, it has no external dependencies.
By default, it listens on all ip's on tcp port 7115.  It has a
manual page, lyricd.8.

The command-line client is called lyric.py, it has no external
dependencies.  By default, it connects to localhost on port 7115.
It has a manual page, lyric.1.

The web-interface is provided by lighttpd (an external webserver
which you need to install yourself) and an SCGI program named
pylyrics-scgid.py (which depends on the external library flup).
Flup can be downloaded from http://www.saddi.com/software/flup/.
Be sure to get a recent version, older version mishandled PATH_INFO and
SCRIPT_NAME, which breaks pylyrics-scgid.  It must be extracted in the
directed where pylyrics-scgid.py resides, with the directory (re)named
"flup".  By default, pylyrics-scgid.py listens on localhost, tcp port
4000 for scgi connections.

To use lighttpd, edit the first line of the supplied lighttpd.conf.
Change it to the path where you extracted pylyrics.  The supplied
lighttpd.conf makes lighttpd listen on all ip's, tcp port 8800.
Lighttpd-ipv6.conf is the same configuration as lighttpd.conf, but
listens on ipv6.  After starting, the web-interface should be
available at http://localhost:8800/lyrics.


LICENSE & AUTHOR

All files in pylyrics are in the public domain.  08-04-2007.
Author: Mechiel Lukkien <mechiel@xs4all.nl>.
Darklyrics code by BD.


CAVEATS

Most of the lyric sites are searched using google.  Not their API
(which would add a configuration burdon), but their normal
web-interface.  This means that search breaks when they change their
page layout.  This hasn't happened often yet.  When you do a lot
of searching (lots of lyric retrieving), google will make you enter
a check-code to make sure you are not doing automated queries.


DEVELOPMENT

New sites can be added quite easily.  Start with an implementation
for other sites, which can be found in directory lyricsites.  For
help, send me an e-mail.  Development is done locally on my machine,
there is no CVS repository.

The protocol used between the daemon and the clients is described
in protocol.txt.  It is a very simple text-based protocol.

The file test is an rc (a shell) script that searches and retrieves
a lyric from each supported site.  This can be used as a regression
test.

The file pylyrics.py is a client library.  For sample usage, see
pylyrics-scgid.py, which uses all features of the library.


TODO

- Add timeout for search, at least check what happens on website
  fetch timeout. (IMPORTANT)
- More long term: use outgoing connections from multiple IP's, so
  that google won't detect us as doing automated searches.  Could also
  register at google to use their interface, but that is more
  trouble.