United States Patent and Trademark Office OG Notices: 12 July 2005
DEPARTMENT OF COMMERCE Patent and Trademark Office 37 CFR Part 1 [Docket No.: 2005-P-062] RIN 0651-AB91 Acceptance, Processing, Use and Dissemination of Chemical and Three-Dimensional Biological Structural Data in Electronic Format AGENCY: United States Patent and Trademark Office, Commerce. ACTION: Advance notice of proposed rule making. SUMMARY: This advance notice of proposed rule making is to inform the public that the United States Patent and Trademark Office (USPTO) is considering amending its rules of practice to require submission of chemical and three-dimensional (3-D) biological structural data in electronic format. The USPTO anticipates that requiring submission of chemical and 3-D biological structural data in electronic format in patent applications will improve the processing and examination of patent applications that include such data, as well as the dissemination of such data to searchable public databases. The purpose of this notice is to encourage comments on this topic, in the form of responses to the questions posed in this notice, from industry, academia, the patent bars, and members of the public. Comment Deadline Date: To be ensured of consideration, written comments must be received on or before August 22, 2005. No public hearing will be held. ADDRESSES: Comments should be sent by electronic mail message over the Internet addressed to AB91.Comments@uspto.gov. Comments may also be submitted by mail addressed to: Mail Stop Comments - Patents, Commissioner for Patents, P.O. Box 1450, Alexandria, VA, 22313-1450, or by facsimile to (571) 273-3373, marked to the attention of Lisa J. Hobbs, Ph.D., Search Systems Project Manager, Search and Information Resources Administration, Office of the Deputy Commissioner for Patent Resources and Planning. Although comments may be submitted by mail or facsimile, the Office prefers to receive comments via the Internet. If comments are submitted by mail, the Office prefers that the comments be submitted on a DOS formatted 3 1/2 inch disk accompanied by a paper copy. Comments may also be sent by electronic mail message over the Internet via the Federal eRulemaking Portal. See the Federal eRulemaking Portal Web site (http://www.regulations.gov) for additional instructions on providing comments via the Federal eRulemaking Portal. The comments will be available for public inspection at the Office of the Commissioner for Patents, located in Madison East, Tenth Floor, 600 Dulany Street, Alexandria, Virginia, and will be available through anonymous file transfer protocol (ftp) via the Internet (http://www.uspto.gov). Because comments will be made available for public inspection, information that the submitter does not desire to make public, such as an address or phone number, should not be included in the comments. FOR FURTHER INFORMATION CONTACT: Lisa J. Hobbs, Ph.D., Search Systems Project Manager, Search and Information Resources Administration, Office of the Deputy Commissioner for Patent Resources and Planning, by telephone at (571) 272-3373, respectively, by mail addressed to: Box Comments - Patents, Commissioner for Patents, P.O. Box 1450, Alexandria, VA 22313-1450, or by facsimile to (571) 273-3373, marked to the attention of Lisa J. Hobbs. SUPPLEMENTARY INFORMATION: 1. General Background Information: It is becoming increasingly apparent that the USPTO needs to begin investigation of procedures for the submission, screening, processing, storing, searching, analysis and dissemination of chemical and 3-D biological structural data in appropriate electronic formats. The rate at which these data are being generated is poised to increase by several orders of magnitude in the coming years as significant advances are being made in the ability to readily determine structural information. Initiatives to fund research in these areas are being supported by both numerous governmental agencies and private industry entities. With the advancement of capabilities allowed by automation, the number of public and private databases hosting these types of data for information exchange is growing daily. It has yet to be determined whether or not the USPTO will receive an increasing number of applications comprising 3-D crystal data and/or chemical structure data. However, the USPTO currently receives a significant amount of chemical structure data, and has begun to receive some very large submissions of 3-D protein crystal data. Consequently, the USPTO has decided to begin the planning and coordination of how best to provide the capability to manage, process, search, and disseminate this information as appropriate. Similar to the process involved in the promulgation of the sequence rules (37 CFR 1.821-1.825 and WIPO ST.25), the USPTO intends to work with other international intellectual property offices in developing any new standards for the submission of chemical or 3-D structural data in electronic format. In an effort to facilitate public comment to the questions set forth below, the following additional background information is provided: 2. Background Specific to 3-D Biological Structural Data: X-ray crystallographic studies and nuclear magnetic resonance (NMR) spectroscopy studies of biological macromolecules provide mechanisms for obtaining detailed 3-D structural information. The current scientific priorities, and concomitant intellectual property priorities, of many laboratories include using 3-D protein crystal data to assist in unraveling the complex relationship between sequence, structure, and function. Knowledge of the 3-D structures of biological macromolecules is an essential element for guiding studies and developing an understanding of biological processes. Three dimensional structural coordinate data provide essential information that can be exploited for protein engineering, rational drug design, and other biotechnology efforts (Gilliland, et al. 1996 J. Res. Natl. Inst. Stand. Technol. 101: 309-320). Bioinformatics, the collection and use of scientific database entries to predict the structure or behavior or evolutionary relatedness of particular biological macromolecules based on sequence similarity or structural similarity to known macromolecules, is one of the fastest growing scientific disciplines. The ability of the scientific community to ``data mine'' known scientific information is directly dependent on the public availability of all prior art data. The worldwide Protein Data Bank (wwPDB; http://www.wwpdb.org/ index.html) is a collection of all publicly available 3-D structure data of large molecules of proteins and nucleic acids, experimentally determined by X-ray crystallography and NMR, which is freely and publicly available to the global community. The PDB, which is under the oversight of the Research Collaboratory for Structural Bioinformatics (RCSB, USA), the Macromolecular Structure Database (MSD) at the European Bioinformatics Institute (EBI) and the Protein Data Bank Japan (PDBj) at the Institute for Protein Research, has grown from 7 structures in 1971 to a database containing over 30,900 structures as of May 2005. The PDB's growth has been accompanied by increases in both data content and the structural complexity of individual entries. A further acceleration in growth is anticipated as the result of developments in high-throughput structural determination methodologies and worldwide structural genomics efforts (Westbrook, et al. 2003 Nucl. Acids Res. 31(1): 489-491). There are also many secondary sources of 3-D protein crystal data and associated information. One of these is the Molecular Modeling Database (MMDB), maintained as part of the Entrez search system by the National Center for Biotechnology Information (NCBI; http://www.ncbi.nlm.nih.gov/), which is a compilation of all of the PDB 3-D structures of biomolecules and additionally integrates value-added chemical, sequence and structural information in order to facilitate structure-based homology modeling and protein structure prediction. The goal of Entrez's 3-D-structure database is to make protein crystal structure information, and the functional annotation MMDB adds, easily accessible to molecular biologists (Wang, et al. 2002 Nucl. Acids Res. 30(1): 249-252). All of the major 3-D protein crystal databases use a variant of the Crystallographic Information File (CIF) format as the means for obtaining data entries with proper annotation. Ratified in 1990 by the International Union of Crystallography (ICUr), CIF is a format that enables the characterization of small crystal structures. In 1997, the CIF format was modified to include information specific to macromolecules, resulting in version 1.0 of the macromolecular Crystallographic Information File (mmCIF) dictionary (Bourne, et al. 1997 Meth. Enzymol. 227: 571-590). The PDB database initially accepted files in a proprietary pdb format in 1971, but has now moved to accepting all files, and converting the backfile, into mmCIF. Some databases, especially those involved in secondary, value-added information, have further modified the mmCIF format to include more data fields and annotations. MMDB uses the format, ASN.1, which is specific to the NCBI and addresses structural and functional linkages. The ASN.1 format also allows for a 3-D viewer to be used to visualize the protein crystal. In addition to databases containing information on the crystal structures of biomolecules, there are major repositories for other types of crystal structures. The Cambridge Structural Database (CSD), maintained by the Cambridge Crystallographic Data Centre (CDCC; http://www.ccdc.cam. ac.uk/), is a worldwide repository of small molecule crystal structures and has over 300,000 organic and metallo-organic compound records. The CSD database accepts entries in the CIF data format in plain ASCII text. Repositories for other types of crystal structures include: the Nucleic Acids Data Bank (ndb; http://ndbserver.rutgers.edu/), which stores oligonucleotides; the Inorganic Crystal Structure Database (ICSD; http://www.fiz.informationsdienste.de/en/DB/icsd/); and, CRYSTMET R (http://www.tothcanada.com/), which stores metals and alloys. 3. Background Specific to Chemical Structural Data: While the use of drawings to denote specific molecular relationships and chemical bonds is a very old art, the embodiments and uses of these drawings are evolving rapidly as supporting technology evolves. Two main methods for handling chemical data are: chemical drawing systems that depend on annotations added to unique substance records, in specific electronic file-types, and text files that are a compilation of unique data determining a canonical representation. Electronic files containing drawings created by chemical drawing software would provide the most accessible data set for processing, use in searching, and public dissemination. However, there is currently no single, publicly available, software that has been accepted as the standard for this type of drawings. Some publicly available chemical data depiction systems are: (1) SMILES (http://www.daylight.com/dayhtml/smiles/); (2) SMARTS/SMIRKS (http://www.daylight.com/dayhtml/doc/theory/theory.rxn.html#RTFrxn18); (3) ACD ChemSketch (http://www.acdlabs.com/download/); and (4) MDL ISIS/Draw (http://www.mdli.com/downloads/downloadable/index.jsp). Some proprietary chemical data depiction systems are: (1) ChemDraw (http://www.cambridgesoft.com/products/family.cfm?FID=2); (2) ACD/Name (http://www.acdlabs.com/products/name_lab/); (3) Chemistry 4-D Draw (http://www.cheminnovation.com/products/chem4d.asp); and (4) ChemWindow (http://www.bio-rad.com/). One of the difficulties facing the USPTO in moving toward acceptance of chemical drawings in electronic format is the preponderance of proprietary software and file-types. Prior to filing a patent application, many applicants have already created drawings of chemical structures of interest for publication or presentation purposes; however, these drawings could be in one of many publicly available file-types, or in a file-type specific to a particular software product. It is not possible to require applicants to purchase proprietary drawing software, nor is it possible to accept and handle all possible file-types. One alternative to requiring a non-standard publicly available format, requiring a proprietary format, or accepting a multiplicity of drawing file-types would be the use of a standardized text format to describe a chemical structure. Two possibilities for this type of file are: Chemical Markup Language (CML; http://www.xml-cml.org/), or a joint effort currently under way between the International Union of Pure and Applied Chemistry and the National Institute of Standards and Technology, the IUPAC-NIST Chemical Identifier (INChI; http://www.iupac. org/projects/2000/2000-025-1-800.html). A description of INChI states that it would enable an automatic conversion to a graphical representation of a chemical substance that could be performed anywhere in the world, and could be built into desktop chemical structure drawing packages and on-line chemical structure drawing applets (A.J. McNaught 2001 http://www.iupac. org/nomenclature/chem_id_project.html). Rule Making Considerations Executive Order 13132: This rule making does not contain policies with federalism implications sufficient to warrant preparation of a Federalism Assessment under Executive Order 13132 (Aug. 4, 1999). Executive Order 12866: This rule making has been determined to be not significant for purposes of Executive Order 12866 (Sept. 30, 1993). Paperwork Reduction Act: This notice involves information collection requirements which are subject to review by the Office of Management and Budget (OMB) under the Paperwork Reduction Act of 1995 (44 U.S.C. 3501 et seq.). The collections of information involved in this notice have been reviewed and previously approved by OMB under OMB control numbers: 0651-0022, 0651-0024, 0651-0031, and 0651-0032. The principal impact of the changes under consideration in this advance rule would be to revise the rules of practice to require or provide for the submission of chemical and three-dimensional (3-D) biological structural data in electronic form. The Office is not resubmitting any information collection package to OMB for its review and approval because the this advance notice does not propose any changes that would affect the information collection requirements associated with the information collection under these OMB control numbers. If the Office proceeds with proposing changes to the rules of practice relating to the submission of chemical and three-dimensional (3-D) biological structural data in electronic form, the Office will resubmit an information collection package to OMB for its review and approval for any collections of information whose requirements will be revised as a result of the proposed rule changes. Interested persons are requested to send comments regarding these information collections, including suggestions for reducing this burden, to Robert J. Spar, Director, Office of Patent Legal Administration, Commissioner for Patents, P.O. Box 1450, Alexandria, VA 22313-1450, or to the Office of Information and Regulatory Affairs, Office of Management and Budget, New Executive Office Building, Room 10235, 725 17th Street, N.W., Washington, D.C. 20503, Attention: Desk Officer for the Patent and Trademark Office. Notwithstanding any other provision of law, no person is required to respond to nor shall a person be subject to a penalty for failure to comply with a collection of information subject to the requirements of the Paperwork Reduction Act unless that collection of information displays a currently valid OMB control number. 4. Comments on the following Questions and Any Other Related Matters Are Solicited: A. Questions Pertaining to the Creation of 3-D Structural Data Files 1. What benefits do you foresee for the applicant if electronic filing is adopted? What disadvantages do you foresee? 2. What types of 3-D data would be best submitted electronically? Examples: . Small organic crystals. . Macromolecular peptide/protein crystals. . Inorganic crystals. . Metallic crystals. . Other. 3. Should electronic submission of 3-D data be mandatory, optional, or mandatory for some types (e.g., protein crystals) and optional for others (e.g., small organic crystals)? 4. If electronic submission is mandatory, should the USPTO require all 3-D information cited in application to be submitted in electronic format, including prior art, or only new data? 5. Have tables of 3-D data generally been created for other purposes before preparation of a patent application, e.g., for publication in a scientific journal or submission to a database? If so, . What format(s) are used (e.g., mmCIF, pdb, CIF, other)? . What authoring tool is used to create the files, e.g., ADIT http://pdb.rutgers.edu/mmcif/ADIT/index.html? . What software, if any, is used to validate files of 3-D data, e.g., ADIT Validation Tool or enCIFer (http://www.ccdc.cam.ac.uk/free_services/ encifer/)? 6. Have most of the 3-D tables been submitted to a database before inclusion in a patent application? If so, which one? Examples: . http://www.ccdc.cam.ac.uk/products/csd/ . http://www.rcsb.org/pdb/ . http://www.fiz-informationsdienste.de/en/DB/icsd/ . http://www.tothcanada.com/ 7. Have most of the 3-D tables been published before inclusion in a patent application? 8. Database providers require certain annotation data. Would any of the annotation data currently required by 3-D database providers be unknown or proprietary at the time of filing a patent application (e.g., method used for crystal creation)? 9. Database providers often establish a controlled vocabulary for annotation or feature description information. Would there be any problems created during patent application prosecution if the electronic file relied on dynamic controlled dictionaries or vocabularies, controlled and maintained by database providers, not the USPTO, for the description of features, etc. What would be the pros and cons if the USPTO were to incorporate by reference a public database controlled vocabulary into any adopted standard? Examples: . http://pdb.rutgers.edu/cc_dict_tut.html . http://ndbserver.rutgers.edu/mmcif/dictionaries/index.html 10. Is there annotation data specific to a patent application that does not appear in public database files but that would be desirable to provide for an electronic submission in a patent application (e.g., continuing application data, attorney's docket number)? 11. Do many/most file wrapper submissions with 3-D data contain multiple 3-D tables? B. Questions Pertaining to the USPTO Receipt of 3-D Files 1. In general, 3-D structure data tables submitted as part of a patent application are quite lengthy. Should the USPTO require that all 3-D files greater than a certain size be submitted in electronic media only? 2. Should the USPTO require submission in electronic format at the time of filing, or, if a paper copy is filed, permit the electronic submission to be filed later (with a statement indicating that the electronic version is the same as the version originally filed)? 3. Should any statement that comes with an electronic file outline the authoring tool and certify the use of a validation tool? 4. Should the rules be revised to specify that 3-D biological structural data, if a paper copy is provided, is to appear in a special section, e.g., between the specification and the Sequence Listing? C. Questions Pertaining to the Use of 3-D Electronic Files by the USPTO Examiners/STIC Personnel 1. If enough patent applications are filed directed to 3-D structures to go forward with pursuing search capability (a 3-D file search, not the standard sequence search and text search already performed) of some sort, what databases should be investigated? 2. What software viewer would be recommended for visual interpretation of the text tables? Examples: . http://www.ncbi.nlm.nih.gov/Structure/CN3-D/cn3-D.shtml . http://products.cambridgesoft.com/ProdInfo.cfm?pid=285 . http://www.proteinscope.com/ . http://www.candomultimedia.com/medical/ D. Questions Pertaining to 3-D File Export to a Public Database Partner 1. If the USPTO receives 3-D structural data in electronic form, the USPTO would likely be able to export the data to a searchable public database upon publication of the application or patent grant. What databases should be investigated for a USPTO export arrangement? 2. Would public databases be willing to work with the USPTO in developing acceptable formats and annotations, if that would be the best submission practice for applicants? E. Questions Pertaining to the USPTO Publication of 3-D Files 1. Should all 3-D files be posted on the USPTO's Publication Site for Issued and Published Sequences (PSIPS; http://seqdata.uspto.gov/)? 2. Should the files be part of the text or image of the patent application publication or patent grant aside from electronic posting on PSIPS? F. Question Pertaining to 3-D File Export to the USPTO Customers The USPTO would be exporting in a new file-type; would this have an adverse or beneficial impact on the USPTO customers? G. Questions Pertaining to the Creation of Chemistry Structural Data Files 1. What benefits do you foresee for the applicant if electronic filing is adopted? What disadvantages do you foresee? 2. Has a structural chemistry data file or drawing generally been created for other purposes before preparation of a patent application, e.g., for publication in a scientific journal or submission to a database? If so, in what format: .mol, .cdx, CML, INChI, other? 3. If drawing tools are used by applicants, which tools are generally used to create the files, e.g., ChemDraw, ISIS/Draw, ACD/Name? . http://www.cambridgesoft.com/products/family.cfm?FID=2 . http://www.mdli.com/products/framework/isis_draw/index.jsp . http://www.acdlabs.com/products/name_lab/name/ 4. Is there annotation data that should be added to the drawings? What annotations? How would applicants prefer to add additional data? 5. Possibly applicants want to cite inventors, attorneys, continuing application data, attorney's docket number, etc.? 6. Should the USPTO require all structures cited in a patent application be submitted in electronic format? Only new data (not prior art)? Only a representative drawing? Only the "actual invention" after restriction of the claims and election of an invention? 7. Would a single representation be deemed a limitation to applicant's disclosure? 8. Do many/most file wrapper submissions with chemical structures contain multiple chemical structure drawings? 9. Have any chemical drawings generally been submitted to a public entity (e.g., a database or journal) before the filing of a patent application? 10. Have most of the drawings been published before the filing of a patent application? 11. Would it be a hardship for applicants if the USPTO required drawings in a proprietary software format? 12. Would it be a hardship for applicants if the USPTO required drawings in a text format that is not yet supported by the major drawing software tools? . How well known is the CML format? . http://www.xml-cml.org/ . How well known is the INChI format? . http://www.iupac.org/publications/ci/2001/may/project_2000-025-1- 050.html . http://www.iupac.org/projects/2000/2000-025-1-800.html#clip 13. What is the state of the art for chemical drawings? . http://www.iupac.org/publications/ci/2002/2404/XML.html H. Questions Pertaining to the USPTO Receipt of Chemistry Structure Files 1. Chemical structure data received by the USPTO varies widely in size. Should the USPTO require that all chemical structure files greater than a certain size be submitted in electronic media only? 2. Should the USPTO require submission in electronic format at the time of filing, or, if a paper copy is filed, permit the electronic submission to be filed later (with a statement indicating that the electronic version is the same as the version originally filed)? 3. Should the rules be revised to specify that chemical structure data, if a paper copy is supplied, is to appear in a special section, e.g., between the specification and the Sequence Listing, or as part of the drawings? 4. Chemical structures are often presented in the specification and claims in Markush format wherein a basic structure is defined, but portions thereof are variable. Are there drawing tools available that accurately render these types of structures? If not, what approach should the USPTO take to ensure that the data submitted appropriately reflects the invention described or claimed in the patent application. For example, the USPTO could require: An "exemplary" drawing at the time of filing; a drawing at the time of a restriction election, e.g., a single embodiment of a Markush claim; or, possibly multiple drawings. 5. The USPTO needs to have certain data associated with files. Since there is no annotation data in chemical drawing files, should the USPTO require a "read me" text file to accompany the drawing file? Should the title of the file be the name of the drawing? I. Question Pertaining to the Use of Chemistry Structure Files by the USPTO Examiners/STIC Personnel If a chemical structure drawing were required at the time of filing, how often might it have so many variables (that may be subject to a restriction/election requirement) that it cannot be effectively searched? If this is likely to be problematic, how can the USPTO effectively require submission of a representative drawing to be searched and, possibly, published? J. Questions Pertaining to Chemistry Structure File Export to a Public Database Partner 1. Should the USPTO send chemical structure data files to a public database partner? If so, which one(s)? 2. Should the USPTO export data to CAS for inclusion in the Registry file? What about other private providers? . http://www.cas.org/EO/regsys.html K. Question Pertaining to the USPTO Publication of Chemistry Structure Files 1. Should all chemistry structure files be posted on the USPTO's Publication Site for Issued and Published Sequences (PSIPS; http://seqdata.uspto.gov/), or should the chemistry drawing be published with the TIFF images of the patent application publication or patent grant? L. Question Pertaining to Chemistry Structure File Export to the USPTO Customers 1. Should we change the drawing files that are sent to the USPTO customers? . Currently, .cdx, .mol, and TIFF versions are present (Note: common to Patent and Trademark Applications) June 15, 2005 JON W. DUDAS Under Secretary of Commerce for Intellectual Property and Director of the United States Patent and Trademark Office