Read this first
- This is a heuristic built from publicly documented crawl windows. It is not a definitive check against any specific dataset or model.
- No data leaves your browser. Nothing is uploaded, logged, or sent to a server. The check runs entirely on this page.
- A “likely” result means a corpus’s crawl window overlaps your publish date and the corpus type matches. It does not mean any model has memorized your work.
- An “unlikely” result means the dates don’t line up. It does not guarantee absence — private datasets and licensing deals are out of scope here.
The corpora — what they swept, when
Reference list. Hand-curated from each dataset’s public documentation. Click out for primary sources.
Opt-out and defense
None of these recall past inclusions. They affect future training runs, future crawls, future datasets. The internet doesn’t forget; you can only steer what comes next.