i'm doing some preliminary work, and might have to create soem sort of distributed/grid application. i'm researching into extracting information from the Amazon.com book database using the AWS development services that Amazon provides. the key issue is that while Amazon permits the extraction/use of the book data from their site/servers, Amazon restricts how fast you can hit their servers with a given machine/IP. the obvious solution is to create a distributed app that would be used to parse/extract the information, building the database.
while the initial app would be to test, to make sure everything would work correctly, the obvious end result would be to use the database to support a possible business venture.
the client app for this project would consist of a perl/python app used to hit the amazon.com server, and then to return the data to the test server.
the goal is to extract information for ~2-3 million books. i estimate that i'd have to have a network of 200-500 machines to accomplish this over 2-3 days....
i'm posting this here, as someone from the boinc_projects list thought there might be people here who might offer to help, or who might be able to point me in the direction of other places that i might turn to to try to find test machines...
if you'd like to help, or if you're interested and want further information, or if you have other places that i might be able to turn to for possible help, let me know.
thanks for whatever help (or pointers) you can give!!
-bruce
bedouglas@earthlink.net