United States Patent and Trademark Office OG Notices: 12 July 2005

                            DEPARTMENT OF COMMERCE
                          Patent and Trademark Office
                                 37 CFR Part 1
                           [Docket No.: 2005-P-062]
                                 RIN 0651-AB91

               Acceptance, Processing, Use and Dissemination of
                  Chemical and Three-Dimensional Biological
                     Structural Data in Electronic Format

AGENCY: United States Patent and Trademark Office, Commerce.

ACTION: Advance notice of proposed rule making.

SUMMARY: This advance notice of proposed rule making is to inform the
public that the United States Patent and Trademark Office (USPTO) is
considering amending its rules of practice to require submission of
chemical and three-dimensional (3-D) biological structural data in
electronic format. The USPTO anticipates that requiring submission of
chemical and 3-D biological structural data in electronic format in
patent applications will improve the processing and examination of
patent applications that include such data, as well as the
dissemination of such data to searchable public databases. The purpose
of this notice is to encourage comments on this topic, in the form of
responses to the questions posed in this notice, from industry,
academia, the patent bars, and members of the public.

   Comment Deadline Date: To be ensured of consideration, written
comments must be received on or before August 22, 2005. No public
hearing will be held.

ADDRESSES: Comments should be sent by electronic mail message over the
Internet addressed to AB91.Comments@uspto.gov. Comments may also be
submitted by mail addressed to: Mail Stop Comments - Patents,
Commissioner for Patents, P.O. Box 1450, Alexandria, VA, 22313-1450, or
by facsimile to (571) 273-3373, marked to the attention of Lisa J.
Hobbs, Ph.D., Search Systems Project Manager, Search and Information
Resources Administration, Office of the Deputy Commissioner for Patent
Resources and Planning. Although comments may be submitted by mail or
facsimile, the Office prefers to receive comments via the Internet. If
comments are submitted by mail, the Office prefers that the comments be
submitted on a DOS formatted 3 1/2 inch disk accompanied by a paper
copy.

   Comments may also be sent by electronic mail message over the
Internet via the Federal eRulemaking Portal. See the Federal
eRulemaking Portal Web site (http://www.regulations.gov) for additional
instructions on providing comments via the Federal eRulemaking Portal.

   The comments will be available for public inspection at the Office
of the Commissioner for Patents, located in Madison East, Tenth Floor,
600 Dulany Street, Alexandria, Virginia, and will be available through
anonymous file transfer protocol (ftp) via the Internet
(http://www.uspto.gov). Because comments will be made available for public
inspection, information that the submitter does not desire to make
public, such as an address or phone number, should not be included in
the comments.

FOR FURTHER INFORMATION CONTACT: Lisa J. Hobbs, Ph.D., Search Systems
Project Manager, Search and Information Resources Administration,
Office of the Deputy Commissioner for Patent Resources and Planning, by
telephone at (571) 272-3373, respectively, by mail addressed to: Box
Comments - Patents, Commissioner for Patents, P.O. Box 1450, Alexandria,
VA 22313-1450, or by facsimile to (571) 273-3373, marked to the
attention of Lisa J. Hobbs.

SUPPLEMENTARY INFORMATION:

   1. General Background Information: It is becoming increasingly
apparent that the USPTO needs to begin investigation of procedures for
the submission, screening, processing, storing, searching, analysis and
dissemination of chemical and 3-D biological structural data in
appropriate electronic formats. The rate at which these data are being
generated is poised to increase by several orders of magnitude in the
coming years as significant advances are being made in the ability to
readily determine structural information. Initiatives to fund research
in these areas are being supported by both numerous governmental
agencies and private industry entities. With the advancement of
capabilities allowed by automation, the number of public and private
databases hosting these types of data for information exchange is
growing daily.

   It has yet to be determined whether or not the USPTO will receive
an increasing number of applications comprising 3-D crystal data and/or
chemical structure data. However, the USPTO currently receives a
significant amount of chemical structure data, and has begun to receive
some very large submissions of 3-D protein crystal data. Consequently,
the USPTO has decided to begin the planning and coordination of how
best to provide the capability to manage, process, search, and
disseminate this information as appropriate.

   Similar to the process involved in the promulgation of the sequence
rules (37 CFR 1.821-1.825 and WIPO ST.25), the USPTO intends to work
with other international intellectual property offices in developing
any new standards for the submission of chemical or 3-D structural data
in electronic format.

   In an effort to facilitate public comment to the questions set
forth below, the following additional background information is
provided:

   2. Background Specific to 3-D Biological Structural Data: X-ray
crystallographic studies and nuclear magnetic resonance (NMR)
spectroscopy studies of biological macromolecules provide mechanisms
for obtaining detailed 3-D structural information. The current
scientific priorities, and concomitant intellectual property
priorities, of many laboratories include using 3-D protein crystal data
to assist in unraveling the complex relationship between sequence,
structure, and function.

   Knowledge of the 3-D structures of biological macromolecules is an
essential element for guiding studies and developing an understanding
of biological processes. Three dimensional structural coordinate data
provide essential information that can be exploited for protein
engineering, rational drug design, and other biotechnology efforts
(Gilliland, et al. 1996 J. Res. Natl. Inst. Stand. Technol.
101: 309-320).

   Bioinformatics, the collection and use of scientific database
entries to predict the structure or behavior or evolutionary
relatedness of particular biological macromolecules based on sequence
similarity or structural similarity to known macromolecules, is one of
the fastest growing scientific disciplines. The ability of the
scientific community to ``data mine'' known scientific information is
directly dependent on the public availability of all prior art data.

   The worldwide Protein Data Bank (wwPDB; http://www.wwpdb.org/
index.html) is a collection of all publicly available 3-D structure
data of large molecules of proteins and nucleic acids, experimentally
determined by X-ray crystallography and NMR, which is freely and
publicly available to the global community. The PDB, which is under the
oversight of the Research Collaboratory for Structural Bioinformatics
(RCSB, USA), the Macromolecular Structure Database (MSD) at the
European Bioinformatics Institute (EBI) and the Protein Data Bank Japan
(PDBj) at the Institute for Protein Research, has grown from 7
structures in 1971 to a database containing over 30,900 structures as
of May 2005. The PDB's growth has been accompanied by increases in both
data content and the structural complexity of individual entries. A
further acceleration in growth is anticipated as the result of
developments in high-throughput structural determination methodologies
and worldwide structural genomics efforts (Westbrook, et al. 2003 Nucl.
Acids Res. 31(1): 489-491).

   There are also many secondary sources of 3-D protein crystal data
and associated information. One of these is the Molecular Modeling
Database (MMDB), maintained as part of the Entrez search system by the
National Center for Biotechnology Information (NCBI;
http://www.ncbi.nlm.nih.gov/), which is a compilation of all of the
PDB 3-D structures of biomolecules and additionally integrates
value-added chemical, sequence and structural information in order to
facilitate structure-based homology modeling and protein structure
prediction. The goal of Entrez's 3-D-structure database is to make protein
crystal structure information, and the functional annotation MMDB adds,
easily accessible to molecular biologists (Wang, et al. 2002 Nucl. Acids
Res. 30(1): 249-252).

   All of the major 3-D protein crystal databases use a variant of the
Crystallographic Information File (CIF) format as the means for
obtaining data entries with proper annotation. Ratified in 1990 by the
International Union of Crystallography (ICUr), CIF is a format that
enables the characterization of small crystal structures. In 1997, the
CIF format was modified to include information specific to
macromolecules, resulting in version 1.0 of the macromolecular
Crystallographic Information File (mmCIF) dictionary (Bourne, et al.
1997 Meth. Enzymol. 227: 571-590). The PDB database initially accepted
files in a proprietary pdb format in 1971, but has now moved to
accepting all files, and converting the backfile, into mmCIF. Some
databases, especially those involved in secondary, value-added
information, have further modified the mmCIF format to include more
data fields and annotations. MMDB uses the format, ASN.1, which is
specific to the NCBI and addresses structural and functional linkages.
The ASN.1 format also allows for a 3-D viewer to be used to visualize
the protein crystal.

   In addition to databases containing information on the crystal
structures of biomolecules, there are major repositories for other types
of crystal structures. The Cambridge Structural Database (CSD), maintained
by the Cambridge Crystallographic Data Centre (CDCC; http://www.ccdc.cam.
ac.uk/), is a worldwide repository of small molecule crystal structures
and has over 300,000 organic and metallo-organic compound records. The CSD
database accepts entries in the CIF data format in plain ASCII text.
Repositories for other types of crystal structures include: the Nucleic
Acids Data Bank (ndb; http://ndbserver.rutgers.edu/), which stores
oligonucleotides; the Inorganic Crystal Structure Database (ICSD;
http://www.fiz.informationsdienste.de/en/DB/icsd/); and, CRYSTMET R
(http://www.tothcanada.com/), which stores metals and alloys.

   3. Background Specific to Chemical Structural Data: While the use
of drawings to denote specific molecular relationships and chemical
bonds is a very old art, the embodiments and uses of these drawings are
evolving rapidly as supporting technology evolves. Two main methods for
handling chemical data are: chemical drawing systems that depend on
annotations added to unique substance records, in specific electronic
file-types, and text files that are a compilation of unique data
determining a canonical representation.

   Electronic files containing drawings created by chemical drawing
software would provide the most accessible data set for processing, use
in searching, and public dissemination. However, there is currently no
single, publicly available, software that has been accepted as the
standard for this type of drawings. Some publicly available chemical
data depiction systems are: (1) SMILES
(http://www.daylight.com/dayhtml/smiles/); (2) SMARTS/SMIRKS
(http://www.daylight.com/dayhtml/doc/theory/theory.rxn.html#RTFrxn18);
(3) ACD ChemSketch (http://www.acdlabs.com/download/); and (4) MDL
ISIS/Draw (http://www.mdli.com/downloads/downloadable/index.jsp). Some
proprietary chemical data depiction systems are: (1) ChemDraw
(http://www.cambridgesoft.com/products/family.cfm?FID=2); (2) ACD/Name
(http://www.acdlabs.com/products/name_lab/); (3) Chemistry 4-D Draw
(http://www.cheminnovation.com/products/chem4d.asp); and (4) ChemWindow
(http://www.bio-rad.com/).

   One of the difficulties facing the USPTO in moving toward
acceptance of chemical drawings in electronic format is the
preponderance of proprietary software and file-types. Prior to filing a
patent application, many applicants have already created drawings of
chemical structures of interest for publication or presentation
purposes; however, these drawings could be in one of many publicly
available file-types, or in a file-type specific to a particular
software product. It is not possible to require applicants to purchase
proprietary drawing software, nor is it possible to accept and handle
all possible file-types.

   One alternative to requiring a non-standard publicly available
format, requiring a proprietary format, or accepting a multiplicity of
drawing file-types would be the use of a standardized text format to
describe a chemical structure. Two possibilities for this type of file
are: Chemical Markup Language (CML; http://www.xml-cml.org/), or a
joint effort currently under way between the International Union of
Pure and Applied Chemistry and the National Institute of Standards and
Technology, the IUPAC-NIST Chemical Identifier (INChI; http://www.iupac.
org/projects/2000/2000-025-1-800.html). A description of INChI states
that it would enable an automatic conversion to a graphical representation
of a chemical substance that could be performed anywhere in the world, and
could be built into desktop chemical structure drawing packages and on-line
chemical structure drawing applets (A.J. McNaught 2001 http://www.iupac.
org/nomenclature/chem_id_project.html).

Rule Making Considerations

   Executive Order 13132: This rule making does not contain policies
with federalism implications sufficient to warrant preparation of a
Federalism Assessment under Executive Order 13132 (Aug. 4, 1999).

   Executive Order 12866: This rule making has been determined to be
not significant for purposes of Executive Order 12866 (Sept. 30, 1993).

   Paperwork Reduction Act: This notice involves information
collection requirements which are subject to review by the Office of
Management and Budget (OMB) under the Paperwork Reduction Act of 1995
(44 U.S.C. 3501 et seq.). The collections of information involved in
this notice have been reviewed and previously approved by OMB under OMB
control numbers: 0651-0022, 0651-0024, 0651-0031, and 0651-0032. The
principal impact of the changes under consideration in this advance
rule would be to revise the rules of practice to require or provide for
the submission of chemical and three-dimensional (3-D) biological
structural data in electronic form. The Office is not resubmitting any
information collection package to OMB for its review and approval
because the this advance notice does not propose any changes that would
affect the information collection requirements associated with the
information collection under these OMB control numbers. If the Office
proceeds with proposing changes to the rules of practice relating to
the submission of chemical and three-dimensional (3-D) biological
structural data in electronic form, the Office will resubmit an
information collection package to OMB for its review and approval for
any collections of information whose requirements will be revised as a
result of the proposed rule changes.

   Interested persons are requested to send comments regarding these
information collections, including suggestions for reducing this
burden, to Robert J. Spar, Director, Office of Patent Legal
Administration, Commissioner for Patents, P.O. Box 1450, Alexandria, VA
22313-1450, or to the Office of Information and Regulatory Affairs,
Office of Management and Budget, New Executive Office Building, Room
10235, 725 17th Street, N.W., Washington, D.C. 20503, Attention: Desk
Officer for the Patent and Trademark Office.

   Notwithstanding any other provision of law, no person is required
to respond to nor shall a person be subject to a penalty for failure to
comply with a collection of information subject to the requirements of
the Paperwork Reduction Act unless that collection of information
displays a currently valid OMB control number.

   4. Comments on the following Questions and Any Other Related
Matters Are Solicited:

A. Questions Pertaining to the Creation of 3-D Structural Data Files

   1. What benefits do you foresee for the applicant if electronic
filing is adopted? What disadvantages do you foresee?

   2. What types of 3-D data would be best submitted electronically?

Examples:

 . Small organic crystals.
 . Macromolecular peptide/protein crystals.
 . Inorganic crystals.
 . Metallic crystals.
 . Other.

   3. Should electronic submission of 3-D data be mandatory, optional,
or mandatory for some types (e.g., protein crystals) and optional for
others (e.g., small organic crystals)?

   4. If electronic submission is mandatory, should the USPTO require
all 3-D information cited in application to be submitted in electronic
format, including prior art, or only new data?

   5. Have tables of 3-D data generally been created for other
purposes before preparation of a patent application, e.g., for
publication in a scientific journal or submission to a database? If so,

. What format(s) are used (e.g., mmCIF, pdb, CIF, other)?
. What authoring tool is used to create the files, e.g., ADIT
http://pdb.rutgers.edu/mmcif/ADIT/index.html?
. What software, if any, is used to validate files of 3-D data, e.g.,
ADIT Validation Tool or enCIFer (http://www.ccdc.cam.ac.uk/free_services/
encifer/)?

   6. Have most of the 3-D tables been submitted to a database before
inclusion in a patent application? If so, which one?

Examples:

 . http://www.ccdc.cam.ac.uk/products/csd/
 . http://www.rcsb.org/pdb/
 . http://www.fiz-informationsdienste.de/en/DB/icsd/
 . http://www.tothcanada.com/

   7. Have most of the 3-D tables been published before inclusion in a
patent application?

   8. Database providers require certain annotation data. Would any of
the annotation data currently required by 3-D database providers be
unknown or proprietary at the time of filing a patent application
(e.g., method used for crystal creation)?

   9. Database providers often establish a controlled vocabulary for
annotation or feature description information. Would there be any
problems created during patent application prosecution if the
electronic file relied on dynamic controlled dictionaries or
vocabularies, controlled and maintained by database providers, not the
USPTO, for the description of features, etc. What would be the pros and
cons if the USPTO were to incorporate by reference a public database
controlled vocabulary into any adopted standard?

Examples:

 . http://pdb.rutgers.edu/cc_dict_tut.html
 . http://ndbserver.rutgers.edu/mmcif/dictionaries/index.html

   10. Is there annotation data specific to a patent application that
does not appear in public database files but that would be desirable to
provide for an electronic submission in a patent application (e.g.,
continuing application data, attorney's docket number)?

   11. Do many/most file wrapper submissions with 3-D data contain
multiple 3-D tables?

B. Questions Pertaining to the USPTO Receipt of 3-D Files

   1. In general, 3-D structure data tables submitted as part of a
patent application are quite lengthy. Should the USPTO require that all
3-D files greater than a certain size be submitted in electronic media
only?

   2. Should the USPTO require submission in electronic format at the
time of filing, or, if a paper copy is filed, permit the electronic
submission to be filed later (with a statement indicating that the
electronic version is the same as the version originally filed)?

   3. Should any statement that comes with an electronic file outline
the authoring tool and certify the use of a validation tool?

   4. Should the rules be revised to specify that 3-D biological
structural data, if a paper copy is provided, is to appear in a special
section, e.g., between the specification and the Sequence Listing?

C. Questions Pertaining to the Use of 3-D Electronic Files by the USPTO
Examiners/STIC Personnel

   1. If enough patent applications are filed directed to 3-D
structures to go forward with pursuing search capability (a 3-D file
search, not the standard sequence search and text search already
performed) of some sort, what databases should be investigated?

   2. What software viewer would be recommended for visual
interpretation of the text tables? Examples:

 . http://www.ncbi.nlm.nih.gov/Structure/CN3-D/cn3-D.shtml
 . http://products.cambridgesoft.com/ProdInfo.cfm?pid=285
 . http://www.proteinscope.com/
 . http://www.candomultimedia.com/medical/

D. Questions Pertaining to 3-D File Export to a Public Database Partner

   1. If the USPTO receives 3-D structural data in electronic form,
the USPTO would likely be able to export the data to a searchable
public database upon publication of the application or patent grant.
What databases should be investigated for a USPTO export arrangement?

   2. Would public databases be willing to work with the USPTO in
developing acceptable formats and annotations, if that would be the
best submission practice for applicants?

E. Questions Pertaining to the USPTO Publication of 3-D Files

   1. Should all 3-D files be posted on the USPTO's Publication Site
for Issued and Published Sequences (PSIPS; http://seqdata.uspto.gov/)?

   2. Should the files be part of the text or image of the patent
application publication or patent grant aside from electronic posting
on PSIPS?

F. Question Pertaining to 3-D File Export to the USPTO Customers

   The USPTO would be exporting in a new file-type; would this have an
adverse or beneficial impact on the USPTO customers?

G. Questions Pertaining to the Creation of Chemistry Structural Data
Files

   1. What benefits do you foresee for the applicant if electronic
filing is adopted? What disadvantages do you foresee?

   2. Has a structural chemistry data file or drawing generally been
created for other purposes before preparation of a patent application,
e.g., for publication in a scientific journal or submission to a
database? If so, in what format: .mol, .cdx, CML, INChI, other?

   3. If drawing tools are used by applicants, which tools are
generally used to create the files, e.g., ChemDraw, ISIS/Draw, ACD/Name?

 . http://www.cambridgesoft.com/products/family.cfm?FID=2
 . http://www.mdli.com/products/framework/isis_draw/index.jsp
 . http://www.acdlabs.com/products/name_lab/name/

   4. Is there annotation data that should be added to the drawings?
What annotations? How would applicants prefer to add additional data?

   5. Possibly applicants want to cite inventors, attorneys,
continuing application data, attorney's docket number, etc.?

   6. Should the USPTO require all structures cited in a patent
application be submitted in electronic format? Only new data (not prior
art)? Only a representative drawing? Only the "actual invention"
after restriction of the claims and election of an invention?

   7. Would a single representation be deemed a limitation to
applicant's disclosure?

   8. Do many/most file wrapper submissions with chemical structures
contain multiple chemical structure drawings?

   9. Have any chemical drawings generally been submitted to a public
entity (e.g., a database or journal) before the filing of a patent
application?

   10. Have most of the drawings been published before the filing of a
patent application?

   11. Would it be a hardship for applicants if the USPTO required
drawings in a proprietary software format?

   12. Would it be a hardship for applicants if the USPTO required
drawings in a text format that is not yet supported by the major
drawing software tools?

 . How well known is the CML format?
 . http://www.xml-cml.org/

 . How well known is the INChI format?
 . http://www.iupac.org/publications/ci/2001/may/project_2000-025-1-
050.html

 . http://www.iupac.org/projects/2000/2000-025-1-800.html#clip

   13. What is the state of the art for chemical drawings?

 . http://www.iupac.org/publications/ci/2002/2404/XML.html

H. Questions Pertaining to the USPTO Receipt of Chemistry Structure
Files

   1. Chemical structure data received by the USPTO varies widely in
size. Should the USPTO require that all chemical structure files
greater than a certain size be submitted in electronic media only?

   2. Should the USPTO require submission in electronic format at the
time of filing, or, if a paper copy is filed, permit the electronic
submission to be filed later (with a statement indicating that the
electronic version is the same as the version originally filed)?

   3. Should the rules be revised to specify that chemical structure
data, if a paper copy is supplied, is to appear in a special section,
e.g., between the specification and the Sequence Listing, or as part of
the drawings?

   4. Chemical structures are often presented in the specification and
claims in Markush format wherein a basic structure is defined, but
portions thereof are variable. Are there drawing tools available that
accurately render these types of structures? If not, what approach
should the USPTO take to ensure that the data submitted appropriately
reflects the invention described or claimed in the patent application.
For example, the USPTO could require: An "exemplary" drawing at the
time of filing; a drawing at the time of a restriction election, e.g.,
a single embodiment of a Markush claim; or, possibly multiple drawings.

   5. The USPTO needs to have certain data associated with files.
Since there is no annotation data in chemical drawing files, should the
USPTO require a "read me" text file to accompany the drawing file?
Should the title of the file be the name of the drawing?

I. Question Pertaining to the Use of Chemistry Structure Files by the
USPTO Examiners/STIC Personnel

   If a chemical structure drawing were required at the time of
filing, how often might it have so many variables (that may be subject
to a restriction/election requirement) that it cannot be effectively
searched? If this is likely to be problematic, how can the USPTO
effectively require submission of a representative drawing to be
searched and, possibly, published?

J. Questions Pertaining to Chemistry Structure File Export to a Public
Database Partner

   1. Should the USPTO send chemical structure data files to a public
database partner? If so, which one(s)?

   2. Should the USPTO export data to CAS for inclusion in the
Registry file? What about other private providers?
 . http://www.cas.org/EO/regsys.html

K. Question Pertaining to the USPTO Publication of Chemistry Structure
Files

   1. Should all chemistry structure files be posted on the USPTO's
Publication Site for Issued and Published Sequences (PSIPS;
http://seqdata.uspto.gov/), or should the chemistry drawing be published
with the TIFF images of the patent application publication or patent grant?

L. Question Pertaining to Chemistry Structure File Export to the USPTO
Customers

   1. Should we change the drawing files that are sent to the USPTO
customers?

 . Currently, .cdx, .mol, and TIFF versions are present (Note: common to
Patent and Trademark Applications)

June 15, 2005                                                  JON W. DUDAS
                                            Under Secretary of Commerce for
                                  Intellectual Property and Director of the
                                  United States Patent and Trademark Office