Artificial Intelligence Patent Dataset

To assist researchers and policymakers focusing on the determinants and impacts of artificial intelligence (AI) invention, the Office of the Chief Economist (OCE) updated the Artificial Intelligence Patent Dataset (AIPD 2023) to identify which of the 15.4 million U.S. patent documents (patents and pre-grant publications, or PGPubs) published from 1976 through 2023 contain AI. 

The AIPD 2023 was created from the original AIPD framework and incorporates several improvements from the recent patent landscaping literature. For example, OCE now incorporates BERT for Patents (Devlin et al. 2018; Srebrovic and Yonamine 2020) into its machine learning architecture used to identify AI in patent documents (originally based on Abood and Feltenberger 2018 and extended in Giczy et al. 2022 and Islam Erana and Finlayson 2024). Additionally, OCE overcame a limitation of the Abood and Feltenberger (2018) “expansion method” used to create the training dataset for the original AIPD (Giczy et al. 2022) by including training observations closer to the “decision boundary” of AI and not AI, thereby enabling the model to learn from patent documents that are more difficult to classify. 

An article describing the AIPD 2023 is available and can be cited as: Pairolero, N., Giczy, A., Torres, G., Islam Erana, T., Finlayson, M., and Toole, A. 2024. The Artificial Intelligence Patent Dataset (AIPD) 2023 update. 

2020 original release

Also available is the original 2020 release of the AIPD, which in addition to identifying the patents and PGPubs published through 2020 that contain AI, includes a second data file that provides the patent documents used to train the machine learning models.

A working paper describing the original AIPD is available at SSRN and as a published version in the Journal of Technology Transfer. Users are requested to cite this documentation when using these data: Giczy, A.V., Pairolero, N.A. & Toole, A.A. Identifying artificial intelligence (AI) invention: a novel AI patent dataset. J Technol Transf (2022). https://doi.org/10.1007/s10961-021-09900-2

This original AIPD was made possible through cross business unit collaboration among OCE, the Office of Policy and International Affairs, the Patents Business Unit, and the Office of the Chief Information Officer. The original AIPD was used in the USPTO report “Inventing AI: Tracing the diffusion of artificial intelligence with U.S. patents” and the Nature Biotechnology article “Discovering value: women’s participation in university and commercial AI invention.”

Release notes: The original AIPD was updated on August 2, 2021 to fix a minor issue affecting the 2019 and 2020 “vision” and “any_ai” predictions.

Data files

Download full set of 2020 data files [.dta format (512 MB)] [.tsv format (1.03 GB)]

Download individual data files:

File Name2020*2023*
ai_model_predictionsDTA
496 MB
TSV
1.02 GB
DTA
649 MB
CSV
764 MB
ai_model_training_doc_seedgroupsDTA
16.2 MB
TSV
14.3 MB
No
data
No
data
  • Note: the 2020 and 2023 .dta files are saved in the Stata-14 format.

 

For questions, please email EconomicsData@uspto.gov.