Skip over navigation

search for patents | search for trademarks

2413 Content of a “Sequence Listing XML” and Form and Format of the “Sequence Listing XML” File [R-01.2024]

In order to ensure consistency and uniformity of a “Sequence Listing XML,” 37 CFR 1.833 defines the content of the “Sequence Listing XML.” 37 CFR 1.834 details the specifics on the technical form and format of the .xml file containing the “Sequence Listing XML” XML file.

2413.01 Parts of the “Sequence Listing XML” [R-01.2024]

The constituent parts of the “Sequence Listing XML” are identified in multiple paragraphs of WIPO Standard ST.26 and have been specifically incorporated into the USPTO rules of practice at 37 CFR 1.839.

2413.01(a) The “Sequence Listing XML” is a Single File Encoded Using Unicode UTF-8 [R-01.2024]

[Editor Note: This section is applicable to all applications with a filing date, or, for national phase applications, an international filing date, on or after July 1, 2022, having disclosure of one or more nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b). Formatting representations of XML (eXtensible Markup Language) elements in this section appear different than shown in Standard ST.26, which may be accessed at: www.wipo.int /export/sites/www/standards/en/pdf/03-26-01.pdf.]

37 CFR 1.833 Requirements for a “Sequence Listing XML” for nucleotide and/or amino acid sequences as part of a patent application filed on or after July 1, 2022.

(a) The “Sequence Listing XML” as required by § 1.831(a) must be presented as a single file in XML 1.0 encoded using Unicode UTF–8, where the character set complies with paragraphs 40 and 41 and Annex IV of WIPO Standard ST.26 (incorporated by reference, see § 1.839).
*****

According to 37 CFR 1.833, the entire “Sequence Listing XML” must be presented as a single file. WIPO Standard ST.26 specifies that the file must be encoded using Unicode UTF-8, with the following restrictions:

(1) the information contained in the elements ApplicantName, InventorName and InventionTitle of the general information part, and the NonEnglishQualifier_value of the sequence data part, may be composed of any valid Unicode characters indicated in the XML 1.0 specification except the Unicode Control code points 0000-001F and 007F-009F. The reserved characters “, &, ‘, <, and > (Unicode code points 0022, 0026, 0027, 003C and 003E respectively), must be replaced as set forth the table below; and
(2) the information contained in all other elements and attributes of the general information part and in all other elements and attributes of the sequence data part must be composed of printable characters (including the space character) from the Unicode Basic Latin code table (i.e., limited to Unicode code points 0020 through 007E). The reserved characters “, &, ‘, <, and > (Unicode code points 0022, 0026, 0027, 003C and 003E respectively), must be replaced as set forth in the table below (WIPO Standard ST.26, paragraph 40).

See MPEP § 2413.01(f) for details about the “general information part” and MPEP § 2413.01(g) for details about the “sequence data part.”

WIPO Standard ST.26 specifies that in an XML instance of a “Sequence Listing XML”, numeric character references must not be used and the following reserved characters must be replaced by the corresponding predefined entities when used in a value of an attribute or content of an element:

List of Reserved Characters and Predefined Entities
Reserved Character	Predefined Entities
<	<
>	>
&	&
“	"
‘	'

Reproduced from WIPO Standard ST.26, paragraph 41. See also WIPO Standard ST.26, paragraph 41, footnote 1 of the WIPO standard for details about “numeric character references.”

The only character entity references permitted are the predefined entities set forth above (WIPO Standard ST.26, paragraph 41).

2413.01(b) The “Sequence Listing XML” Must Be Valid According To the DTD [R-01.2024]

37 CFR 1.833 Requirements for a “Sequence Listing XML” for nucleotide and/or amino acid sequences as part of a patent application filed on or after July 1, 2022.

*****
(b) The “Sequence Listing XML” presented in accordance with paragraph (a) of this section must further:
- (1) Be valid according to the Document Type Definition (DTD) as presented in WIPO Standard ST.26, Annex II.
*****

37 CFR 1.833(b)(1) incorporates the DTD requirement of WIPO Standard ST.26 such that a “Sequence Listing XML” must conform to the Document Type Definition (the document that sets out the structure and use of elements and attributes in an XML compliant document) presented in Annex II of WIPO Standard ST.26. Use of the WIPO Sequence tool will enable a user to generate a “Sequence Listing XML” that conforms to the DTD in Annex II of WIPO Standard ST.26. See MPEP § 2418.

2413.01(c) The “Sequence Listing XML” Must Contain an XML Declaration [R-01.2024]

37 CFR 1.833 Requirements for a “Sequence Listing XML” for nucleotide and/or amino acid sequences as part of a patent application filed on or after July 1, 2022.

*****
(b) The “Sequence Listing XML” presented in accordance with paragraph (a) of this section must further:
- *****
- (2) Comply with the requirements of WIPO Standard ST.26 to include:
  - (i) An XML declaration as defined in paragraph 39(a) of WIPO Standard ST.26;
*****

WIPO Standard ST.26, paragraph 39(a), specifies that the first line of the XML instance must contain the XML declaration, which is:

<?xml version="1.0" encoding="UTF-8"?>

2413.01(d) The “Sequence Listing XML” Must Contain a DOCTYPE Declaration [R-01.2024]

37 CFR 1.833 Requirements for a “Sequence Listing XML” for nucleotide and/or amino acid sequences as part of a patent application filed on or after July 1, 2022.

*****
(b) The “Sequence Listing XML” presented in accordance with paragraph (a) of this section must further:
- *****
- (2) Comply with the requirements of WIPO Standard ST.26 to include:
  - *****
  - (ii) A document type (DOCTYPE) declaration as defined in paragraph 39(b) of WIPO Standard ST.26;
*****

WIPO Standard ST.26, paragraph 39(b), specifies that the second line of the XML instance must contain the document type (DOCTYPE) declaration, which is:

<!DOCTYPE ST26SequenceListing PUBLIC "-//WIPO//DTD Sequence Listing 1.3//EN" "ST26SequenceListing_V1_3.dtd">

2413.01(e) The “Sequence Listing XML” Must Contain a Root Element [R-01.2024]

37 CFR 1.833 Requirements for a “Sequence Listing XML” for nucleotide and/or amino acid sequences as part of a patent application filed on or after July 1, 2022.

*****
(b) The “Sequence Listing XML” presented in accordance with paragraph (a) of this section must further:
- *****
- (2) Comply with the requirements of WIPO Standard ST.26 to include:
  - *****
  - (iii) A root element as defined in paragraph 43 of WIPO Standard ST.26;
*****

WIPO Standard ST.26, paragraph 43, prescribes the root element of an XML instance to have the following attributes:

List of File Attributes
Attribute	Description	Mandatory/Optional
dtdVersion	Version of the DTD used to create this file in the format “V#_#”, e.g., “V1_3”.	Mandatory
fileName	Name of the sequence listing file.	Optional
softwareName	Name of the software that generated this file.	Optional
softwareVersion	Version of the software that generated this file.	Optional
productionDate	Date of production of the sequence listing file (format “CCYY-MM-DD”).	Optional
originalFreeTextLanguageCode	The language code (see reference in paragraph 9 to ISO 639-1:2002) for the single original language in which the language-dependent free text qualifiers were prepared.	Optional
nonEnglishFreeTextLanguageCode	The language code (see reference in paragraph 9 to ISO 63901:2002) for the NonEnglishQualifier_value elements.	Mandatory when a NonEnglishQualifier_value element is present in the sequence listing

Reproduced from paragraph 43 of WIPO Standard ST.26.

2413.01(f) The “Sequence Listing XML” Must Contain a General Information Part [R-01.2024]

37 CFR 1.833 Requirements for a “Sequence Listing XML” for nucleotide and/or amino acid sequences as part of a patent application filed on or after July 1, 2022.

*****
(b) The “Sequence Listing XML” presented in accordance with paragraph (a) of this section must further:
- *****
- (2) Comply with the requirements of WIPO Standard ST.26 to include:
  - *****
  - (iv) A general information part that complies with the requirements of paragraphs 45, 47, and 48, as applicable, of WIPO Standard ST.26
*****

WIPO Standard ST.26 prescribes that the general information part of the sequence listing must contain the following bibliographic elements:

List of Patent Application Information Elements
Element	Description	Mandatory/Optional
ApplicationIdentification The ApplicationIdentification is composed of: IPOfficeCode ApplicationNumberText	The application identification for which the sequence listing is submitted ST.3 Code of the office of filing The application number as provided by the office of filing (e.g., PCT/IB2013/099999)	Mandatory when a sequence listing is furnished at any time following the assignment of the application number Mandatory Mandatory
FilingDate	The date of filing of the patent application for which the sequence listing is submitted (ST.2 format “CCYY-MM-DD”, using a 4-digit calendar year, a 2-digit calendar month and a 2-digit day within the calendar month, e.g., 2015-01-31).	Mandatory when a sequence listing is furnished at any time following the assignment of a filing date
ApplicantFileReference	A single unique identifier assigned by applicant to identify a particular application, typed in the characters as set forth in paragraph 40 (b).	Mandatory when a sequence listing is furnished at any time prior to assignment of the application number; otherwise, Optional
EarliestPriorityApplicationId entification	The identification of the earliest priority application (also contains IPOfficeCode, ApplicationNumberText and FilingDate, see ApplicationIdentification above).	Mandatory where priority is claimed
ApplicantName	Name of the first mentioned applicant typed in the characters as set forth in paragraph 40 (a). This element includes the mandatory attribute languageCode as set forth in paragraph 47.	Mandatory
ApplicantNameLatin	Where ApplicantName is typed in characters other than those as set forth in paragraph 40 (b), a translation or transliteration of the name of the first mentioned applicant must also be typed in characters as set forth in paragraph 40 (b).	Mandatory where ApplicantName contains non-Latin characters
InventorName	Name of the first mentioned inventor typed in the characters as set forth in paragraph 40 (a). This element includes the mandatory attribute languageCode as set forth in paragraph 47.	Optional
InventorNameLatin	Where InventorName is typed in characters other than those as set forth in paragraph 40 (b), a translation or transliteration of the first mentioned inventor may also be typed in characters as set forth in paragraph 40 (b).	Optional
InventionTitle	Title of the invention typed in the characters as set forth in paragraph 40 (a) in the language of filing. A translation of the title of the invention into additional languages may be typed in the characters as set forth in paragraph 40 (a) using additional InventionTitle elements. This element includes the mandatory attribute languageCode as set forth in paragraph 48. The title of invention should be between two to seven words.	Mandatory in the language of filing. Optional for additional languages.
SequenceTotalQuantity	The total number of all sequences in the sequence listing including intentionally skipped sequences (also known as empty sequences) (see paragraph 10).	Mandatory

Reproduced from paragraph 45 of WIPO Standard ST.26.

WIPO Standard ST.26, paragraph 47, specifies that the name of the applicant and, optionally, the name of the inventor must be indicated in the element ApplicantName and InventorName, respectively, as they are generally referred to in the language in which the application is filed. The appropriate language code (referenced in WIPO Standard ST.26, paragraph 9, which references the International Standard ISO 639-1:2002 for the codes for the representation of names of languages) must be indicated in the languageCode attribute for each element. Where the applicant name indicated contains characters other than those of the Latin alphabet permitted as set forth in paragraph 40(b), reproduced in MPEP § 2413.01(a) item (2), a transliteration or translation of the applicant name must also be indicated in characters of the Latin alphabet in the element ApplicantNameLatin. Where the inventor name indicated contains characters other than those of the Latin alphabet, a transliteration or a translation of the inventor name may also be indicated in characters of the Latin alphabet in the element InventorNameLatin.

WIPO Standard ST.26, paragraph 48, provides that the title of the invention must be indicated in the element InventionTitle in the language of filing and may also be indicated in additional languages using multiple InventionTitle elements. The appropriate language code (see WIPO Standard ST.26, paragraph 9, which references the International Standard ISO 639-1:2002) must be indicated in the languageCode attribute of the element. See MPEP § 2413.01(i) for more information about the InventionTitle element in a “Sequence Listing XML.”

2413.01(g) The “Sequence Listing XML” Must Contain a Sequence Data Part [R-01.2024]

37 CFR 1.833 Requirements for a “Sequence Listing XML” for nucleotide and/or amino acid sequences as part of a patent application filed on or after July 1, 2022.

*****
(b) The “Sequence Listing XML” presented in accordance with paragraph (a) of this section must further:
- *****
- (2) Comply with the requirements of WIPO Standard ST.26 to include:
  - *****
  - (v) A sequence data part that complies with the requirements of paragraphs 50–55, 57, 58, 60–69, 71–78, 80–87, 89–98, and 100, as applicable, of WIPO Standard ST.26 representing the nucleotide and/or amino acid sequences according to § 1.832.
*****

The sequence data part is the part of the “Sequence Listing XML” that contains each individual nucleotide or amino acid sequence that meets the definition for inclusion in a “Sequence Listing XML” together with sequence-associated data. WIPO Standard ST.26, paragraph 50, specifies that the sequence data part must be composed of one or more SequenceData elements, each element containing information about one sequence.

WIPO Standard ST.26, paragraph 51, specifies that each SequenceData element must have a mandatory attribute sequenceIDNumber, in which the sequence identifier (see MPEP § 2412.05(a)) for each sequence is contained.

WIPO Standard ST.26 specifies that the SequenceData element must contain a dependent element INSDSeq, consisting of further dependent elements as follows:

List of INSDSeq Dependent Elements
Element	Description	Mandatory	Mandatory/Not Included
		Sequences	Intentionally Skipped Sequences
INSDSeq_length	Length of the sequence	Mandatory	Mandatory with no value
INSDSeq_moltype	Molecule type	Mandatory	Mandatory with no value
INSDSeq_division	Indication that a sequence is related to a patent application	Mandatory with the value “PAT”	Mandatory with no value
INSDSeq_feature-table	List of annotations of the sequence	Mandatory	Must NOT be included
INSDSeq_sequence	Sequence	Mandatory	Mandatory with the value “000”

Reproduced from paragraph 52 of WIPO Standard ST.26.

See MPEP § 2412.05(a) for information about intentionally skipped sequences.

WIPO Standard ST.26, paragraph 53, specifies that the element INSDSeq_length must disclose the number of nucleotides or amino acids of the sequence contained in the INSDSeq_sequence element.

WIPO Standard ST.26, paragraph 54, specifies that the element INSDSeq_moltype must disclose the type of molecule that is being represented. For nucleotide sequences, including nucleotide analogue sequences, the molecule type must be indicated as DNA or RNA. For amino acid sequences, the molecule type must be indicated as AA.

WIPO Standard ST.26, paragraph 55, specifies that for a nucleotide sequence that contains both DNA and RNA segments of one or more nucleotides, the molecule type must be indicated as DNA. The combined DNA/RNA molecule must be further described in the feature table, using the feature key “source” and the mandatory qualifier “organism” with the value “synthetic construct” and the mandatory qualifier “mol_type” with the value “other DNA.” Each DNA and RNA segment of the combined DNA/RNA molecule must be further described with the feature key “misc_feature” and the qualifier “note,” wherein the qualifier value indicates whether the segment is DNA or RNA.

WIPO Standard ST.26, paragraph 57, specifies that the element INSDSeq_sequence must disclose the sequence. Only the appropriate symbols set forth in Table 1: List of Nucleotides Symbols and Table 3: List of Amino Acids Symbols (see MPEP § 2412.03(a)) must be included in the sequence. The sequence must not include numbers, punctuation or whitespace characters.

I. FEATURE TABLE

According to WIPO Standard ST.26, a “feature table” “contains information on the location and roles of various regions within a particular sequence. A feature table is required for every sequence, except for any intentionally skipped sequence, in which case it must not be included. The feature table is contained in the element INSDSeq_feature-table, which consists of one or more INSDFeature elements.” (WIPO Standard ST.26, paragraph 60).

WIPO Standard ST.26 specifies that each INSDFeature element that comprises the feature table describes one feature, and consists of dependent elements as follows:

List of INSDFeature Dependent Elements
Element	Description	Mandatory/Optional
INSDFeature_key	A word or abbreviation indicating a feature	Mandatory
INSDFeature_location	Region of the sequence which corresponds to the feature	Mandatory
INSDFeature_quals	Qualifier containing auxiliary information about a feature	Mandatory where the feature key requires one or more qualifiers, e.g., source; otherwise, Optional

Reproduced from paragraph 61 of WIPO Standard ST.26.

II. FEATURE KEYS

WIPO Standard ST.26, paragraph 62, specifies that Annex I contains the exclusive listing of feature keys that must be used when preparing and submitting a “Sequence Listing XML,” along with an exclusive listing of associated qualifiers and an indication as to whether those qualifiers are mandatory or optional. Section 5 of Annex I of WIPO Standard ST.26 provides the exclusive listing of feature keys for nucleotide sequences and Section 7 of Annex I of WIPO Standard ST.26 provides the exclusive listing of feature keys for amino acid sequences.

III. MANDATORY FEATURE KEYS

WIPO Standard ST.26, paragraph 63, specifies that the “source” feature key is mandatory for all nucleotide sequences and for all amino acid sequences, except for any intentionally skipped sequence. Each sequence must have a single “source” feature key spanning the entire sequence. Where a sequence originates from multiple sources, those sources may be further described in the feature table, using the feature key “misc_feature” and the qualifier “note” for nucleotide sequences, and the feature key “REGION” and the qualifier “note” for amino acid sequences.

IV. FEATURE LOCATION

WIPO Standard ST.26, paragraph 64, specifies that the mandatory element INSDFeature_location must contain at least one location descriptor, which defines a site or a region corresponding to a feature of the sequence in the INSDSeq_sequence element. Amino acid sequences must contain one and only one location descriptor in the mandatory INSDFeature_location element. Nucleotide sequences may have more than one location descriptor in the mandatory INSDFeature_location element when used in conjunction with one or more location operator(s) (more information about location descriptors is discussed below).

WIPO Standard ST.26, paragraph 65, specifies that the location descriptor can be a single residue number, a region delimiting a contiguous span of residue numbers, or a site or region that extends beyond the specified residue or span of residues. The location descriptor must not include numbering for residues beyond the range of the sequence in the INSDSeq_sequence element. For nucleotide sequences only, a location descriptor can be a site between two adjacent residue numbers. Multiple location descriptors must be used in conjunction with a location operator when a feature corresponds to discontinuous sites or regions of a nucleotide sequence (more information about location descriptors and operators is discussed below).

WIPO Standard ST.26, paragraph 66, specifies that the syntax for each type of location descriptor is indicated in Tables (a)-(c) below, where x and y are residue numbers, indicated as positive integers, not greater than the length of the sequence in the INSDSeq_sequence element, and x is less than y.

(a) Location descriptors for nucleotide and amino acid sequences:

Location Descriptors for Nucleotide and Amino Acid Sequences
Location descriptor type	Syntax	Description
Single residue number	x	Points to a single residue in a sequence.
Residue numbers delimitating a sequence span	x. .y	Points to a continuous range of residues bounded by and including the starting and ending residues.
Residues before the first or beyond the last specified residue number	<x >x <x. .y x. .>y <x. .>y	Points to a region including a specified residue or span of residues and extending beyond a specified residue. The '<' and '>' symbols may be used with a single residue or the starting and ending residue numbers of a span of residues to indicate that a feature extends beyond the specified residue number.

Reproduced from paragraph 66 of WIPO Standard ST.26.

(b) Location descriptors for nucleotide sequences only:

Location Descriptors for Nucleotide Sequence Only
Location descriptor type	Syntax	Description
A site between two adjoining nucleotides	x^y	Points to a site between two adjoining nucleotides, e.g., endonucleolytic cleavage site. The position numbers for the adjacent nucleotides are separated by a carat (^). The permitted formats for this descriptor are x^x+1 (for example 55^56), or, for circular nucleotides, x^1, where “x” is the full length of the molecule, i.e. 1000^1 for circular molecule with length 1000.

Reproduced from paragraph 66 of WIPO Standard ST.26.

Location Descriptors for Amino Acid Sequences Only
Location descriptor type	Syntax	Description
Residue numbers joined by an intrachain cross-link	x. .y	Points to amino acids joined by an intrachain linkage when used with a feature that indicates an intrachain cross-link, such as “CROSSLNK” or “DISULFID”.

Reproduced from paragraph 66 of WIPO Standard ST.26.

WIPO Standard ST.26 specifies that the INSDFeature_location element of nucleotide sequences may contain one or more location operators. A location operator is a prefix to either one location descriptor or a combination of location descriptors corresponding to a single but discontinuous feature, and specifies where the location corresponding to the feature on the indicated sequence is found or how the feature is constructed. A list of location operators is provided in the table below with their descriptions. Location operators can be used for nucleotides only.

Location Operators
Location syntax	Location description
join (location, location ,..., location)	The indicated locations are joined (placed end-to-end) to form one contiguous sequence.
order (location, location,...,location)	The elements are found in the specified order but nothing is implied about whether joining those elements is reasonable.
complement (location)	Indicates that the feature is located on the strand complementary to the sequence span specified by the location descriptor, when read in the 5’ to 3’ direction or in the direction that mimics 5’ to 3’ direction.

Reproduced from paragraph 67 of WIPO Standard ST.26.

WIPO Standard ST.26, paragraph 68, specifies that the join and order location operators require that at least two comma-separated location descriptors be provided. Location descriptors involving sites between two adjacent residues, i.e. x^y, must not be used within a join or order combination of locations. Use of the join location operator implies that the residues described by the location descriptors are physically brought into contact by biological processes (for example, the exons that contribute to a coding region feature).

WIPO Standard ST.26, paragraph 69, specifies that the location operator “complement” can be used in combination with either “join” or “order” within the same location. Combinations of “join” and “order” within the same location must not be used. See paragraph 70, examples of WIPO Standard ST.26.

WIPO Standard ST.26, paragraph 71, specifies that in an XML instance of a “Sequence Listing XML” , the characters “<” and “>” in a location descriptor must be replaced by the appropriate predefined entities, “<” and “>”, respectively (see MPEP § 2413.01(a) regarding the predefined entities).

V. FEATURE QUALIFIERS

WIPO Standard ST.26, paragraph 72, specifies that qualifiers are used to supply information about features in addition to that conveyed by the feature key and feature location. There are three types of value formats to accommodate different types of information conveyed by qualifiers, namely:

(a) free text (see MPEP §§ 2413.01(g), subsection IX and 2413.01(h), for more detail about “free text”);
(b) controlled vocabulary or enumerated values (e.g., a number or date); and
(c) sequences.

WIPO Standard ST.26, paragraph 73, specifies that Section 6 of Annex I contains the exclusive listing of qualifiers and their specified value formats, if any, for each nucleotide sequence feature key and Section 8 of Annex I contains the exclusive listing of qualifiers and their specified value formats, if any, for each amino acid sequence feature key.

WIPO Standard ST.26, paragraph 74, specifies that any sequence encompassed by 37 CFR 1.831(b) (see MPEP § 2412.03) that is provided as a qualifier value must be separately included in the “Sequence Listing XML” and assigned its own sequence identifier as described in MPEP § 2412.05(a).

VI. MANDATORY FEATURE QUALIFIERS

WIPO Standard ST.26, paragraph 75, specifies that one mandatory feature key, i.e., “source” requires two mandatory qualifiers, “organism” and “mol_type.” Some optional feature keys also require mandatory qualifiers. See Annex I of WIPO Standard ST.26, Sections 5 and 7, for listings of feature keys with mandatory qualifiers.

VII. QUALIFIER ELEMENTS

WIPO Standard ST.26 specifies that the element INSDFeature_quals contains one or more INSDQualifier elements. Each INSDQualifier element represents a single qualifier and consists of three dependent elements and one optional attribute, as shown below:

List of INSDQualifier Dependent Elements
Element/Attribute	Description	Mandatory/Optional
INSDQualifier_name	Name of the qualifier (see Annex I, Sections 6 and 8).	Mandatory
INSDQualifier_value	Value of the qualifier, if any, in the specified format (see Annex I, Sections 6 and 8) and composed in the characters as set forth in paragraph 40(b).	Mandatory, when specified (see paragraph 87 and Annex I, Sections 6 and 8)
NonEnglishQualifier_value	Value of the qualifier, if any, in the specified format (see Annex I, Sections 6 and 8) and composed in the characters as set forth in paragraph 40(a).	Mandatory, when specified (see paragraph 87 and Annex I, Sections 6 and 8)
NonEnglishQualifier_value	Value of the qualifier, if any, in the specified format (see Annex I, Sections 6 and 8) and composed in the characters as set forth in paragraph 40(a).	Mandatory, when specified (see paragraph 87 and Annex I, Sections 6 and 8)
id	A qualifier with a language-dependent free text value may be uniquely identified by using the optional XML attribute 'id' in the element INSDQualifier (see paragraph 87(d)). The value of the 'id' attribute must start with the letter 'q' and continue with any positive integer. The value of an 'id' attribute must be unique to one INSDQualifier element, i.e. the attribute value must only be used once in a sequence listing file.	Optional

Reproduced from paragraph 76 of WIPO Standard ST.26.

VIII. ORGANISM AND MOL_TYPE QUALIFIERS

WIPO Standard ST.26, paragraph 77, specifies that the organism qualifier, i.e., “organism” for nucleotide sequences (See Table 5: List of Qualifier Values for Nucleotide Sequences with Language-Dependent Free-Text Values reproduced in MPEP § 2413.01(h), Annex I, section 6, of WIPO Standard ST.26) and “organism” for amino acid sequences (see Table 6: List of Qualifiers for Amino Acid Sequences with Language-Dependent Free Text Values reproduced in MPEP § 2413.01(h), Annex I, section 6, of WIPO Standard ST.26) must disclose the source, i.e., a single organism or origin, of the sequence. Organism designations should be selected from a taxonomy database.

WIPO Standard ST.26, paragraph 78, specifies that if the sequence is naturally occurring and the source organism has a Latin genus and species designation, that designation must be used as the qualifier value. The preferred English common name may be specified using the qualifier “note” for nucleotide sequences and amino acid sequences, but must not be used in the organism qualifier value.

WIPO Standard ST.26, paragraph 80, specifies that if the sequence is naturally occurring and the source organism has a known Latin genus, but the species is unspecified or unidentified, then the organism qualifier value must indicate the Latin genus followed by “sp”.

WIPO Standard ST.26, paragraph 81, specifies that if the sequence is naturally occurring, but the Latin organism genus and species designation is unknown, then the organism qualifier value must be indicated as “unidentified”. Any known taxonomic information should be indicated in the qualifier “note” for nucleotide sequences and the qualifier “note” for amino acid sequences.

WIPO Standard ST.26, paragraph 82, specifies that if the sequence is naturally occurring and the source organism does not have a Latin genus and species designation, such as a virus, then another acceptable scientific name (e.g., “Canine adenovirus type 2”) must be used as the organism qualifier value.

WIPO Standard ST.26, paragraph 83, specifies that if the sequence is not naturally occurring, the organism qualifier value must be indicated as “synthetic construct.” Further information with respect to the way the sequence was generated may be specified using the qualifier “note” for nucleotide sequences and the qualifier “note” for amino acid sequences.

WIPO Standard ST.26, paragraph 84, specifies that the “mol_type” qualifier for nucleotide sequences and “mol_type” qualifier for amino acid sequences must disclose the type of molecule represented in the sequence. These qualifiers are distinct from the element INSDSeq_moltype discussed above where INSDSeq_moltype for nucleotide sequences, including nucleotide analogue sequences must be indicated as DNA or RNA, and for amino acid sequences, must be indicated as AA:

(1) For a nucleotide sequence, the “mol_type” qualifier value must be one of the following: “genomic DNA”, “genomic RNA”, “mRNA”, “tRNA”, “rRNA”, “other RNA”, “other DNA”, “transcribed RNA”, “viral cRNA”, “unassigned DNA”, or “unassigned RNA”. If the sequence is not naturally occurring, i.e. the value of the “organism” qualifier is “synthetic construct”, the “mol_type” qualifier value must be either “other RNA” or “other DNA”;
(2) For an amino acid sequence, the “mol_type” qualifier value is “protein.”

IX. FREE TEXT

WIPO Standard ST.26, paragraph 85, specifies that “free text” is a type of value format for certain qualifiers presented in the form of a descriptive text phrase or other specified format (see MPEP § 2413.01(h) for the definition of “free text” and see Annex I of WIPO Standard ST.26 for controlled vocabulary).

WIPO Standard ST.26, paragraph 86, specifies that the use of free text must be limited to a few short terms indispensable for the understanding of a characteristic of the sequence. For each qualifier other than the “translation” qualifier, the free text must not exceed 1000 characters.

WIPO Standard ST.26, paragraph 87, specifies that language-dependent free text is the free text value of certain qualifiers that is language-dependent in that it may require translation for international, national, or regional procedures. Qualifiers for nucleotide sequences with a language-dependent free text value format are identified in Annex I, Table 5: List of Qualifiers with Language-Dependent FreeText Values for Nucleotide Sequences (reproduced in MPEP § 2413.01(h)). Qualifiers for amino acid sequences with a language-dependent free text value format are identified in Annex I, Table 6: List of Qualifiers with Language-Dependent Free Text Values for Amino Acid Sequences (reproduced in MPEP § 2413.01(h)).

X. CODING SEQUENCES

WIPO Standard ST.26, paragraph 89, specifies that the “CDS” feature key may be used to identify coding sequences, i.e., sequences of nucleotides which correspond to the sequence of amino acids in a protein and the stop codon. The location of the “CDS” feature in the mandatory element INSDFeature_location must include the stop codon.

WIPO Standard ST.26, paragraph 90, specifies that the “transl_table” and “translation” qualifiers may be used with the “CDS” feature key (see Annex I of WIPO Standard ST.26). Where the “transl_table” qualifier is not used, the use of the Standard Code Table (see Annex I, Section 9, Table 7 of WIPO Standard ST.26) is assumed.

WIPO Standard ST.26, paragraph 91, specifies that the “transl_except” qualifier must be used with the “CDS” feature key and the “translation” qualifier to identify a codon that encodes either pyrrolysine or selenocysteine.

WIPO Standard ST.26, paragraph 92, specifies that an amino acid sequence encoded by the coding sequence and disclosed in a “translation” qualifier that is encompassed by the description of sequences found in MPEP § 2412.03 must be included in the sequence listing and assigned its own sequence identifier. The sequence identifier assigned to the amino acid sequence must be provided as the value in the qualifier “protein_id” with the “CDS” feature key. The “organism” qualifier of the “source” feature key for the amino acid sequence must be identical to that of its coding sequence.

XI. VARIANTS

MPEP § 2412.05(c) provides information about representation and inclusion of variants

WIPO Standard ST.26, paragraph 93, specifies that a primary sequence and any variant of that sequence, each disclosed by enumeration of its residues and encompassed by the description of sequences found in MPEP § 2412.03 must each be included in the sequence listing and assigned its own sequence identifier.

WIPO Standard ST.26, paragraph 94, specifies that any variant sequence, disclosed as a single sequence with enumerated alternative residues at one or more positions, must be included in the sequence listing and should be represented by a single sequence, wherein the enumerated alternative residues are represented by the most restrictive ambiguity symbol. See MPEP § 2412.05(b), subsection II, for more information regarding representing alternative nucleotide residues and MPEP § 2412.05(d), subsection II, for more information regarding representing alternative amino acid residues.

WIPO Standard ST.26, paragraph 95, specifies that any variant sequence, disclosed only by reference to deletion(s), insertion(s), or substitution(s) in a primary sequence in the sequence listing, should be included in the sequence listing. Where included in the sequence listing, such a variant sequence:

(a) may be represented by annotation of the primary sequence, where it contains variation(s) at a single location or multiple distinct locations and the occurrence of those variations are independent;
(b) should be represented as a separate sequence and assigned its own sequence identifier, where it contains variations at multiple distinct locations and the occurrence of those variations are interdependent; and
(c) must be represented as a separate sequence and assigned its own sequence identifier, where it contains an inserted or substituted sequence that contains in excess of 1000 residues (see WIPO Standard ST.26, paragraph 86).

WIPO Standard ST.26, paragraph 96, specifies the proper use of feature keys and qualifiers for nucleic acid and amino acid sequence variants from the table List of Feature Keys and Qualifiers (reproduced in MPEP § 2412.05(c)).

WIPO Standard ST.26, paragraph 97, specifies that annotation of a sequence for a specific variant must include a feature key and qualifier, as indicated in the table in MPEP § 2412.05(c), and the feature location. The value for the “replace” qualifier must be only a single alternative nucleotide or nucleotide sequence using only the symbols in set forth Table 1: List of Nucleotides Symbols (see MPEP § 2413.01(a)), or empty. A listing of alternative residues may be provided as the value in the “note” qualifier. In particular, a listing of alternative amino acids must be provided as the value in the “note” qualifier where “X” is used in a sequence, and represents a value other than “any one of ‘A’, ‘R’, ‘N’, ‘D’, ‘C’, ‘Q’, ‘E’, ‘G’, ‘H’, ‘I’, ‘L’, ‘K’, ‘M’, ‘F’, ‘P’, ‘O’, ‘S’, ‘U’, ‘T’, ‘W’, ‘Y’, or ‘V.’” A deletion must be represented by an empty qualifier value for the “replace” qualifier or by an indication in the “note” qualifier that the residue may be deleted. An inserted or substituted residue(s) must be provided in the “replace” or “note” qualifier. The value format for the “replace” and “note” qualifiers is free text and must not exceed 1000 characters. See below for sequences encompassed by the definition in MPEP § 2412.03 that are provided as an insertion or a substitution in a qualifier value.

WIPO Standard ST.26, paragraph 98, specifies that the symbols set forth in Tables 1 to 4 of Annex I, reproduced in MPEP §§ 2412.03(a), 2412.03(c), and 2412.05(b), subsection III, should be used to represent variant residues where appropriate. For the “note” qualifier, where the variant residue is a modified residue not set forth in Tables 2 or 4 the complete unabbreviated name of the modified residue must be provided as the qualifier value. Modified residues must be further described in a feature table as described in MPEP § 2412.05(b), subsection III for modified nucleotides and MPEP § 2412.05(d), subsection III, for modified amino acids.

WIPO Standard ST.26, paragraph 100, specifies that a sequence encompassed by the description of sequences found in MPEP § 2412.03 that is provided as an insertion or a substitution in a qualifier value for a primary sequence annotation must also be included in the sequence listing and assigned its own sequence identifier.

2413.01(h) Language Dependent Free Text Qualifier Values in the English Language [R-01.2024]

37 CFR 1.833 Requirements for a “Sequence Listing XML” for nucleotide and/or amino acid sequences as part of a patent application filed on or after July 1, 2022.

*****
(b) The “Sequence Listing XML” presented in accordance with paragraph (a) of this section must further:
- *****
- (3) Include an INSDQualifier_value element with a value in English for any language-dependent free text qualifier as defined by paragraphs 76 and 85–87 of WIPO Standard ST.26, and as required by § 1.52(b)(1)(ii).

WIPO Standard ST.26, paragraph 3(n), uses the term “free text” to mean a type of value format for certain qualifiers, presented in the form of a descriptive text phrase or other specified format.

WIPO Standard ST.26, paragraph 3(o), uses the term “language-dependent free text” to mean the free text value of certain qualifiers that may require translation for international, national or regional procedures.

WIPO Standard ST.26, paragraph 85, provides that free text is a type of value format for certain qualifiers presented in the form of a descriptive text phrase or other specified format with reference to the controlled vocabulary found in Annex I of WIPO Standard ST.26. See MPEP § 2413.01(g), subsection IX.

WIPO Standard ST.26, paragraph 86, requires that the use of free text be limited to short terms indispensable for the understanding of a characteristic of the sequence. For each qualifier other than the “translation” qualifier, the free text must not exceed 1000 characters.

Table 5: List of Qualifiers with Language-Dependent FreeText Values for Nucleotide Sequences
Section	Language-Dependent Free Text Qualifier
6.3	bound_moiety
6.5	cell_type
6.8	clone
6.9	clone_lib
6.11	collected_by
6.14	cultivar
6.15	dev_stage
6.18	ecotype
6.21	frequency
6.22	function
6.24	gene_synonym
6.26	haplogroup
6.28	host
6.29	identified_by
6.30	isolate
6.31	isolation_source
6.32	lab_host
6.36	mating_type
6.41	note
6.45	organism
6.47	phenotype
6.49	pop_variant
6.50	product
6.66	serotype
6.67	serovar
6.68	sex
6.69	standard_name
6.70	strain
6.71	sub_clone
6.72	sub_species
6.73	sub_strain
6.75	tissue_lib
6.76	tissue_type
6.81	variety

Reproduced from Table 5 of WIPO Standard ST.26, Annex I, Section 6.

Qualifiers with a language-dependent free text values for amino acid sequences are identified below:

Table 6: List of Qualifiers with Language-Dependent Free Text Values for Amino Acid Sequences
Section	Language-Dependent Free Text Qualifier
8.2	note
8.3	organism

Reproduced from Table 6 of WIPO Standard ST.26, Annex I, Section 8.

(a) Language-dependent free text must be presented in the INSDQualifier_value element in English, or in the NonEnglishQualifier_value element in a language other than English, or in both elements. Note that if an organism name is a Latin genus and species name, no translation is required. Technical terms and proper names originating from non-English words that are used internationally are considered English for the purpose of the value of the INSDQualifier_value element (e.g., ‘in vitro’, ‘in vivo’).
(b) If a NonEnglishQualifier_value element is present in a sequence listing, the appropriate language code (see WIPO Standard ST.26 at paragraph 9, which references the International Standard ISO 639-1:2002) must be indicated in the nonEnglishFreeTextLanguageCode attribute in the root element (see List of File Attributes reproduced in MPEP § 2413.01(e)). All NonEnglishQualifier_value elements in a single sequence listing must have values in the language indicated in the nonEnglishFreeTextLanguageCode attribute. The NonEnglishQualifier_value element is permitted only for qualifiers that have a language-dependent free text value format.
(c) Where NonEnglishQualifier_value and INSDQualifier_value are both present for a single qualifier, the information contained in the two elements must be equivalent. One of the following conditions must be true: NonEnglishQualifier_value contains a translation of the value of INSDQualifier_value; or, INSDQualifier_value contains a translation of the value of NonEnglishQualifier_value; or, both elements contain a translation of the qualifier value from the language specified in the originalFreeTextLanguageCode attribute (see List of File Attributes reproduced in MPEP § 2413.01(e)).
(d) For qualifiers with a language-dependent free text value, the INSDQualifier element may include an optional attribute id. The value of this attribute must be in the format “q” followed by a positive integer, e.g. “q23”, and must be unique to one INSDQualifier element, i.e. the attribute value must only be used once in a “Sequence Listing XML” file (WIPO Standard ST.26, paragraph 87).

2413.01(i) Title Element in “Sequence Listing XML” [R-01.2024]

The element InventionTitle described in the table of elements of the general information part (reproduced in MPEP § 2413.01(f) from WIPO Standard ST.26, paragraph 45) as a required element of the “general information” part of the “Sequence Listing XML” must be in the language of filing. It is possible for an applicant to provide more than one InventionTitle element of the general information part in more than one language. However, when the USPTO is the Receiving Office (RO/US) for an international application, the English language is required for the InventionTitle element and also for language-dependent free text of the “Sequence Listing XML.” (See MPEP § 2413.01(h) for more information about language-dependent free text.) If non-English language is presented to RO/US for a “Sequence Listing XML,” then RO/US will transfer the international application to the International Bureau, under PCT Rule 19.4.

2413.02 Form and Format of the XML file containing the “Sequence Listing XML” [R-01.2024]

37 CFR 1.834 Form and format for nucleotide and/ or amino acid sequence submissions as the “Sequence Listing XML” in patent applications filed on or after July 1, 2022.

(a) A “Sequence Listing XML” encoded using Unicode UTF–8, created by any means (e.g., text editors, nucleotide/amino acid sequence editors, or other custom computer programs) in accordance with §§ 1.831 through 1.833, must:
- (1) Have the following compatibilities:
  - (i) Computer compatibility: PC or Mac^®; and
  - (ii) Operating system compatibility: MS–DOS^®, MS-Windows^®, Mac OS^®, or Unix^®/Linux^®.
- (2) Be in XML format, where all permitted printable characters (including the space character) and nonprintable (control) characters are defined in paragraph 40 of WIPO Standard ST.26 (incorporated by reference, see § 1.839).
- (3) Be named as *.xml, where “*” is one character or a combination of characters limited to upper- or lowercase letters, numbers, hyphens, and underscores, and the name does not exceed 60 characters in total, excluding the extension. No spaces or other types of characters are permitted in the file name.
*****

In order for the USPTO to be able to process the “Sequence Listing XML” .xml file, all characters must be encoded using Unicode UTF-8. The file must be compatible with PC or Mac^® computers using one of the following operating systems, MS–DOS^®, MS-Windows^®, Mac OS^®, or Unix^®/Linux^®. The printable and non-printable characters in the .xml file are defined in paragraph 40 and 41 of WIPO Standard ST.26 (see MPEP § 2413.01(a)) where Annex IV of WIPO Standard ST.26 provides a table of the CHARACTER SUBSET FROM THE UNICODE BASIC LATIN CODE TABLE FOR USE IN AN XML INSTANCE OF A SEQUENCE LISTING.

2413.03 How to Submit the “Sequence Listing XML” [R-01.2024]

37 CFR 1.834 Form and format for nucleotide and/ or amino acid sequence submissions as the “Sequence Listing XML” in patent applications filed on or after July 1, 2022.

*****
(b) The “Sequence Listing XML” must be in a single file containing the sequence information and be submitted either:
- (1) Electronically via the USPTO patent electronic filing system, where the file size must not exceed 100 MB, and file compression is not permitted; or
- (2) On read-only optical disc(s) in compliance with § 1.52(e), where:
  - (i) A file that is not compressed must be contained on a single read-only optical disc;
  - (ii) The file may be compressed using WinZip^®, 7-Zip, or Unix^®/Linux^® Zip;
  - (iii) A compressed file must not be self-extracting; or
  - (iv) A compressed XML file that does not fit on a single read-only optical disc may be split into multiple file parts, in accordance with the target read-only optical disc size, and labeled in compliance with § 1.52(e)(5)(vi);
*****

For submission of a “Sequence Listing XML” that is 100MB in size or less, the preferred method of submission is via the USPTO patent electronic filing system. Only the Patent Center system of the USPTO patent electronic filing system is capable of accepting .xml files. If the “Sequence Listing XML” is larger than 100MB in size, then applicant may submit the .xml file on physical media using read-only optical discs (CDs or DVDs). “Sequence Listing XML” submissions on discs may be compressed if the resulting compressed file is non-self-extracting.

The USPTO patent electronic filing system will prohibit an applicant from submitting both a “Sequence Listing XML” (a sequence listing that conforms to WIPO Standard ST.26 as implemented in 37 CFR 1.831 through 1.835) and a “Sequence Listing” (a sequence listing that conforms to ST.25 as implemented in 37 CFR 1.821 through 1.825) in the same submission. Filing a “Sequence Listing” in a 35 U.S.C. 111(a) application having a filing date on or after July 1, 2022, will result in a notice from the Office of Patent Application Processing (OPAP) informing applicant that the submission fails to comply with 37 CFR 1.831 through 1.834 and will require submission of a “Sequence Listing XML.” See MPEP § 2415.03 for addressing improper submission of a “Sequence Listing” that complies with 37 CFR 1.821-1.824 where a “Sequence Listing XML” that complies with 37 CFR 1.831-1.834 is required in a national nonprovisional application.

To facilitate administrative processing of all papers and read-only optical discs associated with sequence rule compliance, all read-only optical discs, fees, and papers accompanying them filed in the Office should be marked “Mail Stop SEQUENCE.” Correspondence relating to the submission of a “Sequence Listing XML” may also be hand-delivered to the Customer Service Window. In cases of hand delivery to the Customer Service Window, the read-only optical disc enclosed in a hard case should be placed in a protective mailer. The use of staples and clips, if any, should be confined to carefully attaching the protective mailer to the submitted papers without contact or compression of the media. The labeling requirements of 37 CFR 1.52(e) and including the application number (if known), apply to all read-only optical disc submissions. For submission of a “Sequence Listing XML” larger than 100MB in size in a new application, it is recommended that the user file the application without the “Sequence Listing XML” using the USPTO patent electronic filing system to obtain the application number, and then file the “Sequence Listing XML” on read-only optical disc(s) in accordance with 37 CFR 1.52(e) on the same day by using Priority Mail Express® from the USPS in accordance with 37 CFR 1.10, or hand delivery, in order to secure the same filing date for all parts of the application. In no situations should additional or complimentary electronic copies be delivered to examiners or other Office personnel.

2413.04 Requirements Regarding Incorporation By Reference of the “Sequence Listing XML” [R-01.2024]

37 CFR 1.834 Form and format for nucleotide and/ or amino acid sequence submissions as the “Sequence Listing XML” in patent applications filed on or after July 1, 2022.

*****
- (c)(1) Unless paragraph (c)(2) of this section applies, when the “Sequence Listing XML” required by § 1.831(a) is submitted in XML file format via the USPTO patent electronic filing system or on a read-only optical disc (in compliance with § 1.52(e)), then the specification must contain a statement in a separate paragraph (see § 1.77(b)(5)) that incorporates by reference the material in the XML file identifying:
  - (i) The name of the file;
  - (ii) The date of creation; and
  - (iii) The size of the file in bytes; or
- (2) If the “Sequence Listing XML” required by § 1.831(a) is submitted in XML file format via the USPTO patent electronic filing system or on a read-only optical disc (in compliance with § 1.52(e)) for an international application during the international stage, then an incorporation by reference statement of the material in the XML file is not required.

37 CFR 1.835 Amendment to add or replace a “Sequence Listing XML” in patent applications filed on or after July 1, 2022.

*****
(c) The specification of a complete application, filed on the application filing date, with a “Sequence Listing XML” as required under § 1.831(a), without an incorporation by reference of the material contained in the “Sequence Listing XML” file, must be amended to include a separate paragraph incorporating by reference the material contained in the “Sequence Listing XML” file, in accordance with § 1.77(b)(5)(ii), except for international applications.
*****

Since the “Sequence Listing XML” is not the text of the specification, but rather is sequence data in an XML file format, an incorporation by reference statement is needed to ensure that the content of the “Sequence Listing XML,” submitted to the USPTO as an XML file, is considered part of the disclosure capable of providing 35 U.S.C. 112(a) support for the disclosure and any claims relating to nucleotide and/or amino acid sequences. The incorporation by reference statement identifies: (i) the name of the file; (ii) the date of creation of the file; and (iii) the size of the file in bytes Note that this requirement pertaining to applicant submission of a “Sequence Listing XML” does not apply to a sequence listing that is part of an international application and communicated to the USPTO under PCT Article 20, for a national phase application.

The incorporation by reference statement of the material in an .xml file is required to be part of the specification so it is clear to the Office, the printer, and the public that the application as originally filed includes material on an .xml file. However, for international applications during the international phase, where a “Sequence Listing XML” is submitted via the USPTO patent electronic filing system or on read-only optical disc, no such incorporation by reference statement is required.

If an applicant submits a “Sequence Listing XML” with the complete specification but fails to include the required incorporation by reference paragraph, a notice can be issued by pre-examination requiring the incorporation by reference paragraph in the context of a substitute specification under 37 CFR 1.125. Similarly, the examiner could require applicant to amend the specification to include an incorporation by reference paragraph of the “Sequence Listing XML” file. See also form paragraph 24.24.26.

2413.05 Presumptions Regarding Compliance [R-01.2024]

Neither the presence nor absence of information which is not required under the sequence rules will create a presumption that such information is necessary to satisfy any of the requirements of 35 U.S.C. 112. Further, the grant of a patent on an application that is subject to 37 CFR 1.831 through 37 CFR 1.835 constitutes a presumption that the granted patent complies with the requirements of these rules.

[top]

This page is owned by Patents.

Last Modified: 10/30/2024 08:50:24