Custom web datasets to build your competitive edge.

Turn any source of data on the web into a dataset delivered daily to your inbox.

Built by engineers from

Why use PointScrape

Consistently fresh data

PointScrape's intelligent prioritization ensures you're always getting the most up to date data.

Low cost & scalable

Intelligent prioritization also means lower compute costs and extreme scalability.

Use any data source

PointScrape's flexibility means any data type and source can be turned into a dataset, whether its HTML, JSON, or custom formats of product, news, or review data.

Our use cases

PointScrape data powers many ML and data capabilities:

LLM and NLP fine tuning

Adapting state of the art models requires significant in-domain datasets that PointScrape can help you build.

Owned-data enrichment

Enrich your owned data with context gathered from the web, like highlighting nearby highly rated restaurants or popular posts about your subject.

Competitive Intelligence

Gather intelligence on your market and competition, like reviews, prices, and availability for competitor products.

Some stats about us

121M

New rows added/month

New rows include news articles, reviews, products, product images, etc.

1.55 days

Median observed data age

PointScrape's intelligent prioritization ensures you are getting the most recent data, soon after it happens.

25,188

Dataset downloads on Kaggle

PointScrape datasets make up some of the most popular datasets on Kaggle.

The Process

Creating a dataset with PointScrape is a simple 3-step process:

Free Consultation

Provide your use case and examples of websites you’d like to scrape, and advise you on how the dataset can be created and a free quote for services.

Dataset Creation

Using your requirements, we set up PointScrape to build your dataset, with scheduled check-ins to make sure the produced dataset is just right.

Set & Forget

PointScrape continues gathering data after initial dataset creation, with a full suite of monitoring and alarms to keep it running.

Automated topic classification with Taggit

Taggit's text classification accurately identifies themes in scraped text. Integration is simple, and Taggit predictions help you quantify or identify topics critical to your business

Schedule a Call

A short consultation is the first step to building your custom dataset.

Frequently Asked Questions

About Us

ThoughtVector has industry leading experience in NLP, machine learning engineering, and web-scraping to help you achieve all your data and machine learning needs.

"We've entered an unprecedented age of opportunity for data and machine learning. Don't let data moats hamstring your company's growth and market position."

Stuart Axelbrooke

Founder

Previously worked at