2412 The Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence Disclosures to Include a Sequence Listing in XML file format [R-01.2024]
[Editor Note: This section is applicable to all applications with a filing date, or, for national phase applications, an international filing date, on or after July 1, 2022, having disclosure of one or more nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b).]
Patent applications that disclose a nucleotide and/or amino acid sequence(s) by enumeration of its residues, as defined in 37 CFR 1.831(b), must present each sequence and associated sequence data in a standardized electronic eXtensible Markup Language (XML) format, (a “Sequence Listing XML”) as a separate part of the specification. This standardized format is set forth in the World Intellectual Property Organization (WIPO) Standard ST.26, and applies to sequence listings in international applications filed under the Patent Cooperation Treaty (PCT) and in national and regional applications filed in the intellectual property offices (IPOs) of WIPO member states. As a result, a single sequence listing in compliance with WIPO Standard ST.26 can be prepared for use in all IPOs of WIPO member states. The regulatory provisions found at 37 CFR 1.831 - 1.835 implement WIPO Standard ST.26 in the USPTO and set forth requirements for presenting sequence data in patent applications filed on or after July 1, 2022, that disclose nucleotide sequences and/or amino acid sequences.
WIPO Standard ST.26 is incorporated by reference into the USPTO regulations by including new regulatory text at 37 CFR 1.839:
37 CFR 1.839 Incorporation by reference.
- (a) Certain material is incorporated by reference into this subpart with the approval of the Director of the Federal Register under 5 U.S.C. 552(a) and 1 CFR part 51. All approved incorporation by reference (IBR) material is available for inspection at the USPTO and at the National Archives and Records Administration (NARA). Contact the USPTO’s Office of Patent Legal Administration at 571–272–7701. For information on the availability of this material at NARA, email fr.inspection@nara.gov or go to www.archives.gov/federal-register/cfr/ibr-locations.html. The material may be obtained from the source(s) in paragraph (b) of this section.
- (b) World Intellectual Property Organization (WIPO), 34
chemin des Colombettes, 1211 Geneva 20 Switzerland, www.wipo.int.
- (1) WIPO Standard ST.26. WIPO Handbook on Intellectual Property Information and Documentation, Standard ST.26: Recommended Standard for the Presentation of Nucleotide and Amino Acid Sequence Listings Using XML (eXtensible Markup Language) including Annexes I–VII, version 1.6, approved November 25, 2022; IBR approved for §§ 1.831 through 1.834.
- (2) [Reserved]
For ease of access, WIPO Standard ST.26 can be found at: www.wipo.int/export/sites/www/standards /en/pdf/03-26-01.pdf
A link to WIPO Standard ST.26 is also found on the USPTO’s Sequence Listing Resource Center:
www.uspto.gov/patents/apply/sequence- listing-resource-center
2412.01 Overview of the Sequence Rules [R-01.2024]
[Editor Note: This section is applicable to all applications with a filing date, or, for national phase applications, an international filing date, on or after July 1, 2022, having disclosure of one or more nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b).]
Under the sequence listing rules, 37 CFR 1.831 - 1.834, an application having a filing date on or after July 1, 2022, that discloses one or more nucleotide and/or amino acid sequences by enumeration of its residues, as defined in 37 CFR 1.831(a), must contain, as a part of the description, a sequence listing in eXtensible Markup Language (XML) format, where the XML file of the sequence information conforms to the requirements of 37 CFR 1.831 - 1.834, which specify requirements of particular paragraphs of WIPO Standard ST.26. For U.S. applications, the sequence rules do not require that all sequences and associated sequence information contained in the required sequence listing part of the description be disclosed elsewhere in the application; the content of a sequence listing is considered part of the disclosure of the invention. However, for international applications, all sequences and associated sequence information contained in the required sequence listing part of the description is considered an application part if present on the filing date without an incorporation by reference.
2412.02 Definition of “Sequence Listing XML” [R-01.2024]
[Editor Note: This section is applicable to all applications with a filing date, or, for national phase applications, an international filing date, on or after July 1, 2022, having disclosure of one or more nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b).]
37 CFR 1.831 Requirements for patent applications filed on or after July 1, 2022, having nucleotide and/or amino acid sequence disclosures.
- (a) Patent applications disclosing a nucleotide and/or amino acid sequences by enumeration of its residues, as defined in paragraph (b) of this section, must contain, as a separate part of the disclosure, a computer readable Sequence Listing in XML format (a “Sequence Listing XML”). Disclosed nucleotide or amino acid sequences that do not meet the definition in paragraph (b) of this section must not be included in the “Sequence Listing XML.” The “Sequence Listing XML” contains the information of the nucleotide and/or amino acid sequence(s) disclosed in the patent application using the symbols and format in accordance with the requirements of §§ 1.832 through 1.834.
-
*****
For 35 U.S.C. 111 applications and international applications filed on or after July 1, 2022, that disclose one or more nucleotide and/or amino acid sequences as enumerated by its residues, each nucleotide and/or amino acid sequence and associated sequence data must be presented as a separate part of the disclosure that comprises the sequences in computer readable XML format in accordance with WIPO Standard ST.26 as implemented by 37 CFR 1.831 - 1.834. This sequence listing is referred to as a “Sequence Listing XML” in order to distinguish it from a “Sequence Listing” submitted in an application having a filing date BEFORE July 1, 2022. For 35 U.S.C. 111 applications and international applications having a filing date before July 1, 2022, that disclose of nucleotide and/or amino acid sequences, see 37 CFR 1.821(c) and 1.821(e)(1). See also MPEP §§ 2421.01 and 2421.02.
2412.02(a) “Enumeration of its residues” [R-01.2024]
[Editor Note: This section is applicable to all applications with a filing date, or, for national phase applications, an international filing date, on or after July 1, 2022, having disclosure of one or more nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b).]
37 CFR 1.831 Requirements for patent applications filed on or after July 1, 2022, having nucleotide and/or amino acid sequence disclosures.
-
*****
- (d) “Enumeration of its residues” means disclosure of a nucleotide or amino acid sequence in a patent application by listing, in order, each residue of the sequence, where the residues are represented in the manner as defined in paragraph 3(c)(i) or (ii) of WIPO Standard ST.26 (incorporated by reference, see § 1.839).
-
*****
WIPO Standard ST.26 specifies that “enumeration of its residues” means “disclosure of a sequence in a patent application by listing, in order, each residue of the sequence, wherein either (i) the residue is represented by a name, abbreviation, symbol, or structure (e.g., HHHHHHQ or HisHisHisHisHisHisGln); or (ii) multiple residues are represented by a shorthand formula (e.g., His6Gln) (WIPO Standard ST.26, paragraph 3(c)). See also the Guidance portion of the Standard, Annex VI, page 3.26.vi.2 for a discussion of “enumeration of its residues” and page 3.26.vi.18-19, 51, 53 for examples.
2412.03 Nucleotides and Amino Acids Included and Excluded From a “Sequence Listing XML” [R-01.2024]
[Editor Note: This section is applicable to all applications with a filing date, or, for national phase applications, an international filing date, on or after July 1, 2022, having disclosure of one or more nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b).]
37 CFR 1.831 Requirements for patent applications filed on or after July 1, 2022, having nucleotide and/or amino acid sequence disclosures.
-
*****
- (b) Nucleotide and/or amino acid sequences, as used
in this section and §§ 1.832 through
1.835, encompass:
- (1) An unbranched sequence or linear region of a branched sequence containing 4 or more specifically defined amino acids, wherein the amino acids form a single peptide backbone; or
- (2) An unbranched sequence or linear region
of a branched sequence of 10 or more specifically defined nucleotides,
wherein adjacent nucleotides are joined by:
- (i) A 3' to 5' (or 5' to 3') phosphodiester linkage; or
- (ii) Any chemical bond that results in an arrangement of adjacent nucleobases that mimics the arrangement of nucleobases in naturally occurring nucleic acids (i.e., nucleotide analogs).
-
*****
- (j) A “Sequence listing XML” must not include any sequences having fewer than 10 specifically defined nucleotides, or fewer than 4 specifically defined amino acids.
Generally, nucleotide sequences that are an unbranched sequence or constitute a linear portion of a branched sequence of 10 or more specifically defined nucleotides are required to be listed in a “Sequence Listing XML.” See MPEP §§ 2412.03(d) and (e) for definitions of nucleotide and modified nucleotide, respectively. See MPEP § 2412.03(a) for definition of “specifically defined.”
Similarly, amino acid sequences that are an unbranched sequence or constitute a linear portion of a branched sequence of 4 or more specifically defined amino acids are required to be listed in a “Sequence Listing XML.” See MPEP §§ 2412.03(b) and (c) for definitions of amino acid and modified amino acid, respectively. See MPEP § 2412.03(a) for definition of “specifically defined.”
37 CFR 1.831(b) sets forth the nucleotide and amino acid sequences which must be included in a “Sequence Listing XML”. 37 CFR 1.831(j) specifies that any sequence having fewer than 10 specifically defined nucleotides, or fewer than 4 specifically defined amino acids must be excluded from any “Sequence listing XML.”
2412.03(a) “Specifically Defined” [R-01.2024]
[Editor Note: This section is applicable to all applications with a filing date, or, for national phase applications, an international filing date, on or after July 1, 2022, having disclosure of one or more nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b).]
37 CFR 1.831 Requirements for patent applications filed on or after July 1, 2022, having nucleotide and/or amino acid sequence disclosures.
-
*****
- (e) “Specifically defined” means any amino acid or nucleotide as defined in paragraph 3(k) of WIPO Standard ST.26.
-
*****
WIPO Standard ST.26, paragraph 3(k), provides that “specifically defined” means any nucleotide other than those represented by the symbol “n” and any amino acid other than those represented by the symbol “X,”, wherein “n” and “X” are used in a conventional manner as shown below in Table 1 for nucleotide symbols and Table 3 for amino acids symbols.
Symbol | Definition |
---|---|
a | adenine |
c | cytosine |
g | guanine |
t | thymine in DNA/uracil in RNA (t/u) |
m | a or c |
r | a or g |
w | a or t/u |
s | c or g |
y | c or t/u |
k | g or t/u |
v | a or c or g; not t/u |
h | a or c or t/u; not g |
d | a or g or t/u; not c |
b | c or g or t/u; not a |
n | a or c or g or t/u; “unknown” or “other” |
Reproduced from WIPO Standard ST.26, Annex I, Section 1.
Symbol | Definition |
---|---|
A | Alanine |
R | Arginine |
N | Asparagine |
D | Aspartic acid (Aspartate) |
C | Cysteine |
Q | Glutamine |
E | Glutamic acid (Glutamate) |
G | Glycine |
H | Histidine |
I | Isoleucine |
L | Leucine |
K | Lysine |
M | Methionine |
F | Phenylalanine |
P | Proline |
O | Pyrrolysine |
S | Serine |
U | Selenocysteine |
T | Threonine |
W | Tryptophan |
Y | Tyrosine |
V | Valine |
B | Aspartic acid or Asparagine |
Z | Glutamine or Glutamic acid |
J | Leucine or Isoleucine |
X | A or R or N or D or C or Q or E or G or H or I or L or K or M or F or P or O or S or U or T or W or Y or V; “unknown” or “other” |
Reproduced from WIPO Standard ST.26, Annex I, Section 3.
2412.03(b) “Amino Acid” [R-01.2024]
[Editor Note: This section is applicable to all applications with a filing date, or, for national phase applications, an international filing date, on or after July 1, 2022, having disclosure of one or more nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b). ]
37 CFR 1.831 Requirements for patent applications filed on or after July 1, 2022, having nucleotide and/or amino acid sequence disclosures.
-
*****
- (f) “Amino acid” includes any D- or L-amino acid or modified amino acid as defined in paragraph 3(a) of WIPO Standard ST.26.
-
*****
WIPO Standard ST.26, paragraph 3(a), defines “amino acid” to mean any amino acid that can be represented using any of the symbols shown in Annex I, Table 3: List of Amino Acids Symbols of WIPO Standard ST.26 (reproduced in MPEP § 2412.03(a)). Such amino acids include, inter alia, D-amino acids and amino acids containing modified or synthetic side chains. Amino acids will be construed as unmodified L-amino acids unless further described in a “feature table”. See MPEP § 2413.01(g), subsection I, for discussion of a “feature table”. A peptide nucleic acid (PNA) residue is not considered an amino acid, but is considered a nucleotide.
2412.03(c) “Modified Amino Acid” [R-01.2024]
[Editor Note: This section is applicable to all applications with a filing date, or, for national phase applications, an international filing date, on or after July 1, 2022, having disclosure of one or more nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b). Formatting representations of XML (eXtensible Markup Language) elements in this section appear different than shown in Standard ST.26, which may be accessed at: www.wipo.int /export/sites/www/ standards/en/pdf/03-26-01.pdf.]
37 CFR 1.831 Requirements for patent applications filed on or after July 1, 2022, having nucleotide and/or amino acid sequence disclosures.
-
*****
- (g) “Modified amino acid” includes any amino acid as described in paragraph 3(e) of WIPO Standard ST.26.
-
*****
WIPO Standard ST.26, paragraph 3(e), defines “modified amino acid” to mean any amino acid as described in the definition of “amino acid”, other than L-alanine, L-arginine, L-asparagine, L-aspartic acid, L-cysteine, L-glutamine, L-glutamic acid, L-glycine, L-histidine, L-isoleucine, L-leucine, L-lysine, L-methionine, L-phenylalanine, L-proline, L-pyrrolysine, L-serine, L-selenocysteine, L-threonine, L-tryptophan, L-tyrosine, or L-valine. See MPEP § 2412.03(b).
Modified amino acids, including D-amino acids, should be represented in the sequence as the corresponding unmodified amino acids whenever possible. Any modified amino acid in a sequence that cannot otherwise be represented by a symbol set forth in Table 3 must be represented by “X”, i.e., an “other” amino acid.
A modified amino acid must be further described in a feature table. See MPEP § 2413.01(g), subsection I, for discussion of a “feature table”. Where applicable, the feature keys “CARBOHYD” or “LIPID” should be used together with the qualifier “note”. The feature key “MOD_RES” should be used for other post-translationally modified amino acids together with the qualifier “note”. The feature key “SITE” together with the qualifier “note” should be used when the modified amino acid is not a post-translationally modified amino acid. See MPEP § 2413.01(g), subsections II and III, for discussion of a “feature key”. The value for the qualifier “note” must either be an abbreviation set forth in Table 4 below or the complete, unabbreviated name of the modified amino acid. The abbreviations set forth in Table 4 or the complete, unabbreviated names must not be used in the sequence itself.
Abbreviation | Modified Amino acid |
---|---|
Aad | 2-Aminoadipic acid |
bAad | 3-Aminoadipic acid |
bAla | beta-Alanine, beta-Aminopropionic acid |
Abu | 2-Aminobutyric acid |
4Abu | 4-Aminobutyric acid, piperidinic acid |
Acp | 6-Aminocaproic acid |
Ahe | 2-Aminoheptanoic acid |
Aib | 2-Aminoisobutyric acid |
bAib | 3-Aminoisobutyric acid |
Apm | 2-Aminopimelic acid |
Dbu | 2,4 Diaminobutyric acid |
Des | Desmosine |
Dpm | 2,2'-Diaminopimelic acid |
Dpr | 2,3-Diaminopropionic acid |
EtGly | N-Ethylglycine |
EtAsn | N-Ethylasparagine |
Hyl | Hydroxylysine |
aHyl | allo-Hydroxylysine |
3Hyp | 3-Hydroxyproline |
4Hyp | 4-Hydroxyproline |
Ide | Isodesmosine |
alle | allo-Isoleucine |
MeGly | N-Methylglycine, sarcosine |
Melle | N-Methylisoleucine |
MeLys | 6-N-Methyllysine |
MeVal | N-Methylvaline |
Nva | Norvaline |
Nle | Norleucine |
Orn | Ornithine |
Reproduced from WIPO Standard ST.26, Annex I, Section 4.
2412.03(d) “Nucleotide” [R-01.2024]
[Editor Note: This section is applicable to all applications with a filing date, or, for national phase applications, an international filing date, on or after July 1, 2022, having disclosure of one or more nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b). Formatting representations of XML (eXtensible Markup Language) elements in this section appear different than shown in Standard ST.26, which may be accessed at: www.wipo.int /export/sites/www/ standards/en/pdf/03-26-01.pdf.]
37 CFR 1.831 Requirements for patent applications filed on or after July 1, 2022, having nucleotide and/or amino acid sequence disclosures.
-
*****
- (h) “Nucleotide” includes any nucleotide, nucleotide analog, or modified nucleotide as defined in paragraphs 3(f) and 3(g) of WIPO Standard ST.26.
-
*****
WIPO Standard ST.26, paragraphs 3(f) and (g), identify a “nucleotide” to encompass any nucleotide or nucleotide analogue or “modified nucleotide” (see MPEP § 2412.03(e)) that can be represented using any of the symbols set forth in Table 1: List of Nucleotides Symbols (see MPEP § 2412.03(a)), wherein the nucleotide, or nucleotide analogue, or modified nucleotide contains:
- (i) a backbone moiety selected from:
- (1) 2’ deoxyribose 5’ monophosphate (the backbone moiety of a deoxyribonucleotide) or ribose 5’ monophosphate (the backbone moiety of a ribonucleotide); or
- (2) an analogue of a 2’ deoxyribose 5’ monophosphate or ribose 5’ monophosphate, which when forming the backbone of a nucleic acid analogue, results in an arrangement of nucleobases that mimics the arrangement of nucleobases in nucleic acids containing a 2’ deoxyribose 5’ monophosphate or ribose 5’ monophosphate backbone, wherein the nucleic acid analogue is capable of base pairing with a complementary nucleic acid; examples of backbone moieties include amino acids as in peptide nucleic acids, glycol molecules as in glycol nucleic acids, threofuranosyl sugar molecules as in threose nucleic acids, morpholine rings and phosphorodiamidate groups as in morpholinos, and cyclohexenyl molecules as in cyclohexenyl nucleic acids; and
- (ii) the backbone moiety is either:
- (1) joined to a nucleobase, including a modified or synthetic purine or pyrimidine nucleobase; or
- (2) lacking a purine or pyrimidine nucleobase when the nucleotide is part of a nucleotide sequence, referred to as an “AP site” or an “abasic site”.
2412.03(e) “Modified Nucleotide” [R-01.2024]
[Editor Note: This section is applicable to all applications with a filing date, or, for national phase applications, an international filing date, on or after July 1, 2022, having disclosure of one or more nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b). Formatting representations of XML (eXtensible Markup Language) elements in this section appear different than shown in Standard ST.26, which may be accessed at: www.wipo.int /export/sites/www/ standards/en/pdf/03-26-01.pdf.]
37 CFR 1.831 Requirements for patent applications filed on or after July 1, 2022, having nucleotide and/or amino acid sequence disclosures.
-
*****
- (i) “Modified nucleotide” includes any nucleotide as described in paragraph 3(f) of WIPO Standard ST.26.
-
*****
WIPO Standard ST.26, paragraph 3(f), provides that a “modified nucleotide” means any “nucleotide” as explained in MPEP § 2412.03(d) other than deoxyadenosine 3’-monophosphate, deoxyguanosine 3’-monophosphate, deoxycytidine 3’-monophosphate, deoxythymidine 3’-monophosphate, adenosine 3’-monophosphate, guanosine 3’-monophosphate, cytidine 3’-monophosphate, or uridine 3’-monophosphate.
2412.04 Use of Sequence Identifiers to Denote Sequences Disclosed in the Description, Drawings or Claims [R-01.2024]
[Editor Note: This section is applicable to all applications with a filing date, or, for national phase applications, an international filing date, on or after July 1, 2022, having disclosure of one or more nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b). Formatting representations of XML (eXtensible Markup Language) elements in this section appear different than shown in Standard ST.26, which may be accessed at: www.wipo.int /export/sites/www/ standards/en/pdf/03-26-01.pdf.]
37 CFR 1.831 Requirements for patent applications filed on or after July 1, 2022, having nucleotide and/or amino acid sequence disclosures.
-
*****
- (c) Where the description or claims of a patent
application discuss a sequence that is set forth in the “Sequence Listing XML”
in accordance with paragraph (a) of this section, reference must be made to the
sequence by use of the sequence identifier, preceded by “SEQ ID NO:” or the
like in the text of the description or claims, even if the sequence is also
embedded in the text of the description or claims of the patent application.
Where a sequence is presented in a drawing, reference must be made to the
sequence by use of the sequence identifier (§ 1.832(a)), either in the drawing or in the Brief
Description of the Drawings, where the correlation between multiple sequences
in the drawing and their sequence identifiers (§ 1.832(a)) in the Brief Description is clear.
*****
37 CFR 1.831(c) requires that each nucleotide and/or amino acid sequence set forth in a “Sequence Listing XML” in accordance with 37 CFR § 1.831(a) must be referenced by a sequence identifier as defined in 37 CFR 1.832(a) (see MPEP § 2412.05(a)), preceded by the notation “SEQ ID NO:” or the like, when the sequence appears in the description or claims. Additionally, where a sequence set forth in a “Sequence Listing XML” is presented in a drawing, reference must be made using the sequence identifier preceded by the notation “SEQ ID NO:” or the like, either in the drawing or in the Brief Description of the Drawings. The sequence identifiers in the disclosure must correspond to sequence identifiers set forth in the “Sequence Listing XML” as defined in 37 CFR 1.832(a).
37 CFR 1.831(c) is also intended to permit references in the application (e.g., specification, claims, or drawings) to sequences set forth in the “Sequence Listing XML” by the use of assigned sequence identifiers without enumerating the sequence. Sequence identifiers can also be used to discuss and/or claim parts or fragments of a properly presented sequence. For example, language such as “residues 14 to 243 of SEQ ID NO:23” is permissible and the fragment need not be separately presented in the “Sequence Listing XML.”
37 CFR 1.831(c) does not alter, in any way, the requirements of 35 U.S.C. 112. The implementation of this rule has had no effect on disclosure and/or claiming requirements. 37 CFR 1.831 - 1.835, in general, or the use of sequence identifiers throughout the specification and claims, specifically, should not raise any issues under 35 U.S.C. 112(a) or 35 U.S.C. 112(b) because the of sequence identifiers (preceded by “SEQ ID NO: or the like”) only provides a shorthand way for applicants to discuss and claim their inventions. These identifiers do not in any way restrict the manner in which an invention can be claimed.
2412.05 Representation and Symbols for Nucleotide and/or Amino Acid Sequences [R-01.2024]
[Editor Note: This section is applicable to all applications with a filing date, or, for national phase applications, an international filing date, on or after July 1, 2022, having disclosure of one or more nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b).]
WIPO Standard ST.26 sets forth specific symbols for representing nucleotide and/or amino acid residues in a sequence. The USPTO rules incorporate those specific symbols and XML formatting representations.
2412.05(a) Use of Sequentially Numbered Sequence Identifiers in the “Sequence Listing XML” [R-01.2024]
[Editor Note: This section is applicable to all applications with a filing date, or, for national phase applications, an international filing date, on or after July 1, 2022, having disclosure of one or more nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b). Formatting representations of XML (eXtensible Markup Language) elements in this section appear different than shown in Standard ST.26, which may be accessed at: www.wipo.int /export/sites/www/standards/en/pdf/03-26-01.pdf.]
37 CFR 1.832 Representation of nucleotide and/or amino acid sequence data in the “Sequence Listing XML” part of a patent application filed on or after July 1, 2022.
- (a) Each disclosed nucleotide or amino acid sequence that meets the requirements of § 1.831(b) must appear separately in the “Sequence Listing XML.” Each sequence set forth in the “Sequence Listing XML” must be assigned a separate sequence identifier. The sequence identifiers must begin with 1 and increase sequentially by integers as defined in paragraph 10 of WIPO Standard ST.26 (incorporated by reference, see § 1.839).
-
*****
In accordance with 37 CFR 1.832(a), the sequence identifiers in the “Sequence Listing XML” must begin with 1 and increase sequentially by integers. The requirement for sequence identifiers, at a minimum, requires that each sequence be assigned a different number for purposes of identification. However, where practical and for ease of reference, sequences should be presented in the “Sequence Listing XML” in numerical order and in the order in which they are discussed in the application.
Each nucleotide and/or amino acid sequence that meets the definition in 37 CFR 1.831(b) and is enumerated by its residues must be assigned a separate sequence identifier, including a sequence which is identical to a region of a longer sequence. See MPEP § 2412.02 for further description of a “sequence”.
Where no sequence is present for a sequence identifier, i.e. an intentionally skipped sequence, “000” must be used in place of a sequence in a “Sequence Listing XML”. The total number of sequences indicated in the “Sequence Listing XML” must equal the total number of sequence identifiers, whether followed by a sequence or by “000”.
WIPO Standard ST.26 paragraphs 58 and 59 require that an intentionally skipped sequences in a “Sequence Listing XML” must be represented as follows:
- (a) the value of the element SequenceData and its attribute sequenceIDNumber, is the sequence identifier of the skipped sequence;
- (b) no value is provided for the elements INSDSeq _length, INSDSeq _moltype, and INSDSeq _division;
- (c) the element INSDSeq _feature-table is not included; and
- (d) the value of the element INSDSeq _sequence is the string “000”.
2412.05(b) Representation and Symbols of Nucleotide Sequence Data [R-01.2024]
[Editor Note: This section is applicable to all applications with a filing date, or, for national phase applications, an international filing date, on or after July 1, 2022, having disclosure of one or more nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b). Formatting representations of XML (eXtensible Markup Language) elements in this section appear different than shown in Standard ST.26, which may be accessed at: www.wipo.int /export/sites/www/standards/en/pdf/03-26-01.pdf.]
37 CFR 1.832 Representation of nucleotide and/or amino acid sequence data in the “Sequence Listing XML” part of a patent application filed on or after July 1, 2022.
-
*****
- (b) The representation and symbols for nucleotide
sequence data shall conform to the requirements of paragraphs (b)(1) through
(4) of this section.
- (1) A nucleotide sequence must be represented in the manner described in paragraphs 11–12 of WIPO Standard ST.26.
- (2) All nucleotides, including nucleotide analogs, modified nucleotides, and “unknown” nucleotides, within a nucleotide sequence must be represented using the symbols set forth in paragraphs 13–16, 19, and 21 of WIPO Standard ST.26.
- (3) Modified nucleotides within a nucleotide sequence must be described in the manner discussed in paragraphs 17, 18, and 19 of WIPO Standard ST.26.
- (4) A region containing a known number of contiguous “a,” “c,” “g,” “t,” or “n” residues for which the same description applies may be jointly described in the manner described in paragraph 22 of WIPO Standard ST.26.
-
*****
WIPO Standard ST.26, paragraph 11, provides that a nucleotide sequence must be represented only by a single strand, in the 5’ to 3’ direction from left to right, or in the direction from left to right that mimics the 5’ to 3’ direction. The designations 5’ and 3’ or any other similar designations must not be included in the sequence. A double-stranded nucleotide sequence disclosed by enumeration of the residues of both strands must be represented as:
- (a) a single sequence or as two separate sequences, each assigned its own sequence identifier, where the two separate strands are fully complementary to each other, or
- (b) two separate sequences, each assigned its own sequence identifier, where the two strands are not fully complementary to each other.
WIPO Standard ST.26, paragraph 12, provides that the first nucleotide presented in the sequence is residue position number 1. When nucleotide sequences are circular in configuration, applicant must choose the nucleotide in residue position number 1. Numbering is continuous throughout the entire sequence in the 5’ to 3’ direction, or in the direction that mimics the 5’ to 3’ direction. The last residue position number must equal the number of nucleotides in the sequence.
II. SYMBOLS FOR A NUCLEOTIDE SEQUENCEWIPO Standard ST.26, paragraph 13, provides that all nucleotides in a sequence must be represented using the symbols as listed in Table 1: List of Nucleotides Symbols (see MPEP § 2412.03(a)). Only lower-case letters must be used. Any symbol used to represent a nucleotide is the equivalent of only one residue.
WIPO Standard ST.26, paragraph 14, sets forth that the symbol “t” will be construed as thymine in deoxyribonucleic acid (DNA) and uracil in ribonucleic acid (RNA). Uracil in DNA or thymine in RNA is considered a modified nucleotide and must be further described in a feature table. See MPEP § 2413.01(g), subsection I for more detail regarding a “feature table.”
WIPO Standard ST.26, paragraph 15, provides that where an ambiguity symbol (representing two or more alternative nucleotides) is appropriate, the most restrictive symbol should be used, as listed in Table 1: List of Nucleotides Symbols (see MPEP § 2412.03(a)). For example, if a nucleotide in a given position could be “a” or “g”, then “r” should be used, rather than “n”. The symbol “n” will be construed as any one of “a”, “c”, “g”, or “t/u” except where it is used with a further description in a feature table. The symbol “n” must not be used to represent anything other than a nucleotide. A single modified or “unknown” nucleotide may be represented by the symbol “n”, together with a further description in a feature table. See MPEP § 2413.01(g), subsection I, for more detail regarding a “feature table.” For representation of sequence variants, i.e., alternatives, deletions, insertions or substitutions relative to a primary sequence, see MPEP § 2412.05(c); and also MPEP § 2413.01(g), subsection XII for information on variants.
WIPO Standard ST.26, paragraph 16, sets forth that modified nucleotides should be represented in the sequence as the corresponding unmodified nucleotides, i.e., “a”, “c”, “g” or “t” whenever possible. Any modified nucleotide in a sequence that cannot otherwise be represented by any other symbol in Table 1: List of Nucleotides Symbols (see MPEP § 2412.03(a)), i.e., an “other” nucleotide, such as a non-naturally occurring nucleotide, must be represented by the symbol “n”. The symbol “n” is the equivalent of only one residue.
WIPO Standard ST.26, paragraph 19, specifies that uracil in DNA or thymine in RNA are considered modified nucleotides and must be represented in the sequence as “t” and be further described in a feature table using the feature key “modified_base”, the qualifier “mod_base” with “OTHER” as the qualifier value and the qualifier “note” with “uracil” or “thymine”, respectively, as the qualifier value. See MPEP § 2413.01(g), subsection I for more detail regarding a “feature table.”
WIPO Standard ST.26, paragraph 21, provides that any “unknown” nucleotide must be represented by the symbol “n” in the sequence. An “unknown” nucleotide should be further described in a feature table using the feature key “unsure”. The symbol “n” is the equivalent of only one residue. See MPEP § 2413.01(g), subsection I, for more detail regarding a “feature table.”
III. DESCRIPTION OF MODIFIED NUCLEOTIDES WITHIN A NUCLEOTIDE SEQUENCEWIPO Standard ST.26, paragraph 17, specifies that a modified nucleotide must be further described in a feature table (see MPEP § 2413.01(g), subsection I, for more detail regarding a “feature table”) using the feature key “modified_base” and the mandatory qualifier “mod_base” in conjunction with a single abbreviation from Table 2: List of Modified Nucleotides, below, as the qualifier value. See MPEP § 2413.01(g) subsections II and III, for more information regarding use of a feature key; and MPEP § 2413.01(g) subsections V and VI, for more information regarding use of a qualifier. If the abbreviation is “OTHER”, the complete unabbreviated name of the modified nucleotide must be provided as the value in a “note” qualifier. For a listing of alternative modified nucleotides, the qualifier value “OTHER” may be used in conjunction with a further “note” qualifier. The abbreviations (or full names) provided in Table 2 must not be used in the sequence itself.
WIPO Standard ST.26, paragraph 18, describes that a nucleotide sequence including one or more regions of consecutive modified nucleotides that share the same backbone moiety must be further described in a feature table as required for a modified nucleotide. See MPEP § 2413.01(g), subsection I, for information regarding a feature table and MPEP § 2412.03(e) regarding modified nucleotides. The modified nucleotides of each such region may be jointly described in a single INSDFeature element of a “feature table” as described below. See MPEP § 2413.01(g), subsection I, for information regarding INSDFeature elements of a feature table. The most restrictive unabbreviated chemical name that encompasses all of the modified nucleotides in the range or a list of the chemical names of all the nucleotides in the range must be provided as the value in the “note” qualifier. For example, a glycol nucleic acid sequence containing “a”, “c”, “g”, or “t” nucleobases may be described in the “note” qualifier as “2,3-dihydroxypropyl nucleosides.” Alternatively, the same sequence may be described in the “note” qualifier as “2,3-dihydroxypropyladenine, 2,3-dihydroxypropylthymine, 2,3-dihydroxypropylguanine, or 2,3-dihydroxypropylcytosine.” Where an individual modified nucleotide in the region includes an additional modification, then the modified nucleotide must also be further described in the feature table as required for a modified nucleotide.
WIPO Standard ST.26, paragraph 19, provides that uracil in DNA or thymine in RNA are considered modified nucleotides and must be represented in the sequence as “t” and be further described in a feature table using the feature key “modified_base”, the qualifier “mod_base” with “OTHER” as the qualifier value and the qualifier “note” with “uracil” or “thymine”, respectively, as the qualifier value.See MPEP § 2413.01(g), subsections II and III, for more information regarding use of a feature key; and MPEP § 2413.01(g), subsections V and VI, for more information regarding use of a qualifier.
Abbreviation | Definition |
---|---|
ac4c | 4-acetylcytidine |
chm5u | 5-(carboxyhydroxymethyl)uridine |
cm | 2'-O-methylcytidine |
cmnm5s2u | 5-carboxymethylaminomethyl-2- thiouridine |
cmnm5u | 5-carboxymethylaminomethyluridine |
dhu | dihydrouridine |
fm | 2'-O-methylpseudouridine |
gal q | beta, D-galactosylqueuosine |
gm | 2'-O-methylguanosine |
i | inosine |
i6a | N6-isopentenyladenosine |
m1a | 1-methyladenosine |
m1f | 1-methylpseudouridine |
m1g | 1-methylguanosine |
m1i | 1-methylinosine |
m22g | 2,2-dimethylguanosine |
m2a | 2-methyladenosine |
m2g | 2-methylguanosine |
m3c | 3-methylcytidine |
m4c | N4-methylcytosine |
m5c | 5-methylcytidine |
m6a | N6-methyladenosine |
m7g | 7-methylguanosine |
mam5u | 5-methylaminomethyluridine |
mam5s2u | 5-methoxyaminomethyl-2-thiouridine |
man q | beta, D-mannosylqueuosine |
mcm5s2u | 5-methoxycarbonylmethyl-2- thiouridine |
mcm5u | 5-methoxycarbonylmethyluridine |
mo5u | 5-methoxyuridine |
ms2i6a | 2-methylthio-N6- isopentenyladenosine |
ms2t6a | N-((9-beta-D-ribofuranosyl-2- methylthiopurine-6- yl)carbamoyl)threonine |
mt6a | N-((9-beta-D-ribofuranosylpurine-6- yl)N-methylcarbamoyl)threonine |
mv | uridine-5-oxyacetic acid-methylester |
o5u | uridine-5-oxyacetic acid |
osyw | wybutoxosine |
p | pseudouridine |
q | queuosine |
s2c | 2-thiocytidine |
s2t | 5-methyl-2-thiouridine |
s2u | 2-thiouridine |
s4u | 4-thiouridine |
m5u | 5-methyluridine |
t6a | N-((9-beta-D-ribofuranosylpurine-6- yl)-carbamoyl)threonine |
tm | 2'-O-methyl-5-methyluridine |
um | 2'-O-methyluridine |
yw | wybutosine |
x | 3-(3-amino-3-carboxy-propyl)uridine, (acp3)u |
OTHER | (requires note qualifier) |
Reproduced from WIPO Standard ST. 26, Annex I, Section 2.
IV. JOINTLY DESCRIBING A REGION OF A NUCLEOTIDE SEQUENCEWIPO Standard ST.26, paragraph 22, specifies that a region containing a known number of contiguous “a”, “c”, “g”, “t”, or “n” residues for which the same description applies may be jointly described using a single INSDFeature element with the syntax “x..y” as the location descriptor in the element INSDFeature_location. See MPEP § 2413.01(g) subsection IV, for information regarding INSDFeature_location. For representation of sequence variants, i.e., alternatives, deletions, insertions or substitutions, see MPEP § 2412.05(c) and MPEP § 2413.01(g), subsection XI.
2412.05(c) Representation and Inclusion of Variants [R-01.2024]
[Editor Note: This section is applicable to all applications with a filing date, or, for national phase applications, an international filing date, on or after July 1, 2022, having disclosure of one or more nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b). Formatting representations of XML (eXtensible Markup Language) elements in this section appear different than shown in Standard ST.26, which may be accessed at: www.wipo.int /export/sites/www/standards/en/pdf/03-26-01.pdf.]
A primary sequence and any variant of that sequence, each disclosed by enumeration of its residues and meeting the definition in 37 CFR 1.831(a) and 1.831(b), must each be included in the “Sequence Listing XML” and assigned its own sequence identifier. Where a variant sequence is disclosed as a single sequence with enumerated alternative residues at one or more positions, it must be included in the “Sequence Listing XML” and should be represented by a single sequence, wherein the enumerated alternative residues are represented by the most restrictive ambiguity symbol. Any variant sequence, disclosed only by reference to deletion(s), insertion(s), or substitution(s) in a primary sequence, should be included in the “Sequence Listing XML”. The table below indicates the proper use of feature keys and qualifiers for nucleic acid and amino acid sequence variants:
Type of sequence | Feature Key | Qualifier | Use |
---|---|---|---|
Nucleic acid | variation | replace or note | Naturally occurring mutations and polymorphisms, e.g., alleles, RFLPs. |
Nucleic acid | misc_difference | replace or note | Variability introduced artificially, e.g., by genetic manipulation or by chemical synthesis. |
Amino acid | VAR_SEQ | note | Variant produced by alternative splicing, alternative promoter usage, alternative initiation and ribosomal frameshifting. |
Amino acid | VARIANT | note | Any type of variant for which VAR_SEQ is not applicable. |
Reproduced from paragraph 96 of WIPO Standard ST.26.
For additional information about feature keys and qualifiers, see MPEP § 2413.01(g), subsections II, III and V.
For additional information about the representation of sequence variants in a “Sequence Listing XML,” see MPEP § 2413.01(g), subsection XI.
2412.05(d) Representation and Symbols of Amino Acid Sequence Data [R-01.2024]
[Editor Note: This section is applicable to all applications with a filing date, or, for national phase applications, an international filing date, on or after July 1, 2022, having disclosure of one or more nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b). Formatting representations of XML (eXtensible Markup Language) elements in this section appear different than shown in Standard ST.26, which may be accessed at: www.wipo.int /export/sites/www/standards/en/pdf/03-26-01.pdf.]
37 CFR 1.832 Representation of nucleotide and/or amino acid sequence data in the “Sequence Listing XML” part of a patent application filed on or after July 1, 2022.
-
*****
- (c) The representation and symbols for amino acid
sequence data shall conform to the requirements of paragraphs (c)(1) through
(4) of this section.
- (1) The amino acids in an amino acid sequence must be represented in the manner described in paragraphs 24 and 25 of WIPO Standard ST.26.
- (2) All amino acids, including modified amino acids and “unknown” amino acids, within an amino acid sequence must be represented using the symbols set forth in paragraphs 26–29 and 32 of WIPO Standard ST.26
- (3) Modified amino acids within an amino acid sequence must be described in the manner discussed in paragraphs 29 and 30 of WIPO Standard ST.26.
- (4) A region containing a known number of contiguous “X” residues for which the same description applies may be jointly described in the manner described in paragraph 34 of WIPO Standard ST.26.
-
*****
WIPO Standard ST.26, paragraph 24, specifies that the amino acids in an amino acid sequence must be represented in the amino to carboxy direction from left to right. The amino and carboxy groups must not be represented in the sequence.
WIPO Standard ST.26, paragraph 25, indicates that the first amino acid in the sequence is residue position number 1, including amino acids preceding the mature protein, for example, pre-sequences, pro-sequences, pre-pro-sequences and signal sequences. When an amino acid sequence is circular in configuration and the ring consists solely of amino acid residues linked by peptide bonds, i.e., the sequence has no amino and carboxy termini, applicant must choose the amino acid in residue position number 1. Numbering is continuous through the entire sequence in the amino to carboxy direction.
II. SYMBOLS FOR AN AMINO ACID SEQUENCEWIPO Standard ST.26, paragraph 26, specifies that all amino acids in a sequence must be represented using the symbols set forth in Table 3: List of Amino Acids Symbols, in MPEP § 2412.03(a) above. Only uppercase letters must be used. Any symbol used to represent an amino acid is the equivalent of only one residue.
WIPO Standard ST.26, paragraph 27, indicates that where an ambiguity symbol (representing two or more amino acids in the alternative) is appropriate, the most restrictive symbol should be used, as listed in Table 3: List of Amino Acids Symbols (MPEP § 2412.03(a)). For example, if an amino acid in a given position could be aspartic acid or asparagine, the symbol “B” should be used, rather than “X”. The symbol “X” will be construed as any one of “A”, “R”, “N”, “D”, “C”, “Q”, “E”, “G”, “H”, “I”, “L”, “K”, “M”, “F”, “P”, “O”, “S”, “U”, “T”, “W”, “Y”, or “V”, except where it is used with a further description in the feature table. The symbol “X” must not be used to represent anything other than an amino acid. A single modified or “unknown” amino acid may be represented by the symbol “X”, together with a further description in a feature table (see MPEP § 2413.01(g), subsection I or MPEP § 2412.03(c), for more detail regarding a “feature table”). For representation and inclusion of sequence variants, see MPEP § 2412.05(c). For details of how to represent variants in a “Sequence Listing XML,” see MPEP § 2413.01(g), subsection XI.
WIPO Standard ST.26, paragraph 28, specifies that disclosed amino acid sequences separated by internal terminator symbols, represented for example by “Ter” or asterisk “*” or period “.” or a blank space, must be included as separate sequences for each enumerated amino acid sequence that contains at least four specifically defined amino acids and is encompassed by the description of sequences found in MPEP § 2412.03. Each such separate sequence must be assigned its own sequence identifier (see MPEP § 2412.05(a)). Terminator symbols and spaces must not be included in a sequence contained in a “Sequence Listing XML”.
Any “unknown” amino acid must be represented by the symbol “X” in the sequence. An “unknown” amino acid designated as “X” must be further described in a feature table (see MPEP § 2413.01(g), subsection I, for more detail regarding a “feature table” ) using the feature key “UNSURE” and optionally the qualifier “note.” The symbol “X” is the equivalent of only one residue (WIPO Standard ST.26, paragraph 32).
III. DESCRIPTION OF MODIFIED AMINO ACIDS WITHIN AN AMINO ACID SEQUENCEWIPO Standard ST.26, paragraph 29, specifies that modified amino acids, including D-amino acids, should be represented in the sequence as the corresponding unmodified amino acids whenever possible. Any modified amino acid in a sequence that cannot otherwise be represented by any other symbol in Table 3: List of Amino Acids Symbols (see MPEP § 2412.03(a)), i.e., an “other” amino acid, must be represented by “X”. The symbol “X” is the equivalent of only one residue.
WIPO Standard ST.26, paragraph 30, provides that a modified amino acid must be further described in a feature table (see MPEP § 2413.01(g), subsection I, for more detail regarding a “feature table”). Where applicable, the feature keys “CARBOHYD” or “LIPID” should be used together with the qualifier “note”. The feature key “MOD_RES” should be used for other post-translationally modified amino acids together with the qualifier “note”. The feature key “SITE” together with the qualifier “note” should be used when the modified amino acid is not a post-translationally modified amino acid. The value for the qualifier “note” must either be an abbreviation set forth in Table 4: List of Modified Amino Acids (see MPEP § 2412.03(c)), above, or the complete, unabbreviated name of the modified amino acid. The abbreviations set forth in Table 4, or the complete, unabbreviated names must not be used in the sequence itself.
IV. JOINTLY DESCRIBING A REGION OF AN AMINO ACID SEQUENCEWIPO Standard ST.26, paragraph 34, provides that a region containing a known number of contiguous “X” residues for which the same description applies may be jointly described in one feature key using the syntax “x..y” as the location descriptor in the element INSDFeature_location (see MPEP § 2413.01(g) subsections II-III for information regarding “feature keys” and subsection IV, for information regarding INSDFeature_location). For representation and inclusion of sequence variants, see MPEP § 2412.05(c). For details of how to represent variants in a “Sequence Listing XML,” see MPEP § 2413.01(g), subsection XII.
2412.05(e) Presentation of Special Situations [R-01.2024]
[Editor Note: This section is applicable to all applications with a filing date, or, for national phase applications, an international filing date, on or after July 1, 2022, having disclosure of one or more nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b)].
37 CFR 1.832 Representation of nucleotide and/or amino acid sequence data in the “Sequence Listing XML” part of a patent application filed on or after July 1, 2022.
-
*****
- (d) A nucleotide and/or amino acid sequence that is constructed as a single continuous sequence derived from one or more non-contiguous segments of a larger sequence or of segments from different sequences must be listed in the “Sequence Listing XML” in the manner described in paragraph 35 of WIPO Standard ST.26.
- (e) A nucleotide and/or amino acid sequence that contains regions of specifically defined residues separated by one or more regions of contiguous “n” or “X” residues, wherein the exact number of “n” or “X” residues in each region is disclosed, must be listed in the “Sequence Listing XML” in the manner described in paragraph 36 of WIPO Standard ST.26.
- (f) A nucleotide and/or amino acid sequence that contains regions of specifically defined residues separated by one or more gaps of an unknown or undisclosed number of residues must be listed in the “Sequence Listing XML” in the manner described in paragraph 37 of WIPO Standard ST.26.
WIPO Standard ST.26, paragraph 35, describes that a sequence disclosed by enumeration of its residues that is constructed as a single continuous sequence from one or more non-contiguous segments of a larger sequence or of segments from different sequences must be included in the “Sequence Listing XML” and assigned its own sequence identifier, as defined in 37 CFR 1.832(a). See MPEP § 2412.05(a).
WIPO Standard ST.26, paragraph 36, describes that a sequence that contains regions of specifically defined residues separated by one or more regions of contiguous “n” or “X” residues, wherein the exact number of “n” or “X” residues in each region is disclosed, must be included in the “Sequence Listing XML” as one sequence and assigned its own sequence identifier.
WIPO Standard ST.26, paragraph 37, describes that a sequence that contains regions of specifically defined residues separated by one or more gaps of an unknown or undisclosed number of residues must not be represented in the “Sequence Listing XML” as a single sequence. Each region of specifically defined residues (as encompassed by the definition in 37 CFR 1.831(a)) must be included in the “Sequence Listing XML” as a separate sequence and assigned its own sequence identifier.
2412.06 The Requirement for Exclusive Conformance; Sequences Presented in Drawing Figures [R-01.2024]
[Editor Note: This section is applicable to all applications with a filing date, or, for national phase applications, an international filing date, on or after July 1, 2022, having disclosure of one or more nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b).]
For all applications that disclose a nucleotide sequence and/or amino acid sequence(s) by enumeration of its residues, as defined in 37 CFR 1.831(b), 37 CFR 1.831(a) requires conformance to the requirements of 37 CFR 1.832 through 37 CFR 1.834 with regard to the manner in which the disclosed nucleotide and/or amino acid sequences are presented and described in the “Sequence Listing XML.” This requirement is necessary to minimize any confusion that could result if more than one format for representing sequence data was employed in a given application.
Pursuant to 37 CFR 1.83(a), sequences that are included in the “Sequence Listing XML” should not be duplicated in the drawings. With the use of feature keys and qualifiers in a “Sequence Listing XML” to represent and describe features of a nucleotide or amino acid sequence, the need to re-present a sequence in a drawing is less critical. However, in certain instances, a significant sequence characteristic may not be readily conveyed by the sequence-associated data of the “Sequence Listing XML” and may need to be depicted in a figure. For example, in view of the fact that the representation of double stranded nucleic acids is not permitted in the “Sequence Listing XML,” many significant nucleic acid features, such as “sticky ends” and the like, may only be shown effectively by reference to a drawing figure. Further, the similarity or homology between/among sequences may only be depicted in an effective manner in a drawing figure. Similarly, drawing figures are recommended for use with amino acid sequences to depict structural features of the corresponding protein, such as epitopes and interaction domains. The situations discussed herein are given by way of example only and there may be many other reasons for including a sequence in a drawing. However, when an enumerated sequence is presented in a drawing, the sequence must still be included in the “Sequence Listing XML” if the sequence falls within the definition set forth in 37 CFR 1.831(b), and a sequence identifier (preceded by “SEQ ID NO:X” or the like) must be used, either in the drawing itself or in the Brief Description of the Drawings.
2412.07 Examination of Patent Applications Claiming Large Numbers of Nucleotide Sequences [R-07.2022]
Content regarding the examination of patent applications claiming large numbers of nucleotide sequences is located in MPEP § 2434.