To assist researchers and policymakers focusing on the determinants and impacts of artificial intelligence (AI) invention, the Office of the Chief Economist (OCE) updated the Artificial Intelligence Patent Dataset (AIPD 2023) to identify which of the 15.4 million U.S. patent documents (patents and pre-grant publications, or PGPubs) published from 1976 through 2023 contain AI.
The AIPD 2023 was created from the original AIPD framework and incorporates several improvements from the recent patent landscaping literature. For example, OCE now incorporates BERT for Patents (Devlin et al. 2018; Srebrovic and Yonamine 2020) into its machine learning architecture used to identify AI in patent documents (originally based on Abood and Feltenberger 2018 and extended in Giczy et al. 2022 and Islam Erana and Finlayson 2024). Additionally, OCE overcame a limitation of the Abood and Feltenberger (2018) “expansion method” used to create the training dataset for the original AIPD (Giczy et al. 2022) by including training observations closer to the “decision boundary” of AI and not AI, thereby enabling the model to learn from patent documents that are more difficult to classify.
An article describing the AIPD 2023 is available and can be cited as: Pairolero, N., Giczy, A., Torres, G., Islam Erana, T., Finlayson, M., and Toole, A. 2024. The Artificial Intelligence Patent Dataset (AIPD) 2023 update.
Release notes: The AIPD 2023 was updated on January 8, 2025 to fix a minor issue affecting the predict93* variables (the initial AIPD 2023 mistakenly used a threshold of 89.93% for these variables instead of 93%).
2020 original release
Also available is the original 2020 release of the AIPD, which in addition to identifying the patents and PGPubs published through 2020 that contain AI, includes a second data file that provides the patent documents used to train the machine learning models.
A working paper describing the original AIPD is available at SSRN and as a published version in the Journal of Technology Transfer. Users are requested to cite this documentation when using these data: Giczy, A.V., Pairolero, N.A. & Toole, A.A. Identifying artificial intelligence (AI) invention: a novel AI patent dataset. J Technol Transf (2022). https://doi.org/10.1007/s10961-021-09900-2.
This original AIPD was made possible through cross business unit collaboration among OCE, the Office of Policy and International Affairs, the Patents Business Unit, and the Office of the Chief Information Officer. The original AIPD was used in the USPTO report “Inventing AI: Tracing the diffusion of artificial intelligence with U.S. patents” and the Nature Biotechnology article “Discovering value: women’s participation in university and commercial AI invention.”
Release notes: The original AIPD was updated on August 2, 2021 to fix a minor issue affecting the 2019 and 2020 “vision” and “any_ai” predictions.
Data files
Download full set of 2020 data files [.dta format (512 MB)] [.tsv format (1.03 GB)]
Download individual data files:
File Name | 2020* | 2023* | ||
---|---|---|---|---|
ai_model_predictions | DTA 496 MB | TSV 1.02 GB | DTA 649 MB | CSV 764 MB |
ai_model_training_doc_seedgroups | DTA 16.2 MB | TSV 14.3 MB | No data | No data |
- Note: the 2020 and 2023 .dta files are saved in the Stata-14 format.
For questions, please email EconomicsData@uspto.gov.