Tell me more ×
Answers OnStartups is a question and answer site for entrepreneurs looking to start or run a new business. It's 100% free, no registration required.

I am looking to mine the data from older news articles for my startup. One way i can think of is to crawl financial websites and extract the information i need. It can be a hard and also not without its pitfalls.

Is there an alternate way, i could licence the news archives in an affordable way. The articles will not be published as it is on my site. I will link to the original article with a blurb like google news.

Thanks Shyam

share|improve this question
If you will only show an excerpt, can't you crawl them (by respecting their robots.txt) and behave just like Google Search? – Alain Raynaud Apr 21 '12 at 22:37
I think crawling and scraping the sites is probably your best option. That or see if their RSS feeds take date paramaters or something and see if you can fake it out to use it or some other URL on their website to dump you the data in a consistent manner so it can be easily parsed. – Ryan Doom May 22 '12 at 2:21

closed as off topic by TimJ Aug 20 '12 at 14:09

Questions on Answers OnStartups are expected to relate to startups within the scope defined in the FAQ. Consider editing the question or leaving comments for improvement if you believe the question can be reworded to fit within the scope. Read more about closed questions here.

1 Answer

There are firms like BoardReader and Moreover that will sell you the data and offer API level access, but they are not particularly cheap.

I, too, will be quite interested to see what is available in machine readable form at low or no cost.

share|improve this answer

Not the answer you're looking for? Browse other questions tagged or ask your own question.