GTFS Data Best Practices

Introduction

These are recommended practices for describing public transportation services in the General Transit Feed Specification (GTFS). These practices have been synthesized from the experience of the GTFS Best Practices working group members and application-specific GTFS practice recommendations. For further background, see the Frequently Asked Questions.

Linking to This Document

Please link here in order to provide feed producers with guidance for correct formation of GTFS data. Each individual recommendation has an anchor link. Click the recommendation to get the URL for the in-page anchor link.

If your GTFS-consuming application involves particular requirements or recommendations for GTFS data practices, it is recommended to publish a document with those requirements or recommendations to supplement these common best practices.

Document Structure

Recommended practices are organized into three primary sections

System Tags

Five different tags are included throughout the list of practices. These tags indicate the type of systems that require the practice being described.

These practices improve customer experience in applications like Google Maps that are used for trip planning.

These practices help maintain the ability for a human reader to unzip and examine GTFS files.

These practices allow arrival prediction software to create real-time arrival estimates related to the schedules in trips.txt and stop_times.txt.

These practices support the creation of HTML timetables based on GTFS, such as with the GTFS-to-HTML software.

Practices

Dataset Publishing & General Practices

# Recommendation
1 Datasets should be published at a public, permanent URL, including the zip file name. (e.g., www.agency.org/gtfs/gtfs.zip). Ideally, the URL should be directly downloadable without requiring login to access the file, to facilitate download by consuming software applications. While it is recommended (and the most common practice) to make a GTFS dataset openly downloadable, if a data provider does need to control access to GTFS for licensing or other reasons, it is recommended to control access to the GTFS dataset using API keys, which will facilitate automatic downloads.
2 GTFS data is published in iterations so that a single file at a stable location always contains the latest official description of service for a transit agency (or agencies).
3 Maintain persistent identifiers (id fields) for stop_id, route_id, and agency_id across data iterations whenever possible.
4 One GTFS dataset should contain current and upcoming service (sometimes called a “merged” dataset).
  • At any time, the published GTFS dataset should be valid for at least the next 7 days, and ideally for as long as the operator is confident that the schedule will continue to be operated.
  • If possible, the GTFS dataset should cover at least the next 30 days of service.
5 Remove old services (expired calendars) from the feed.
6 If a service modification will go into effect in 7 days or fewer, express this service change through a GTFS-realtime feed (service advisories or trip updates) rather than static GTFS dataset.
7 The web-server hosting GTFS data should be configured to correctly report the file modification date (see HTTP/1.1 - Request for Comments 2616, under Section 14.29).

Practice Recommendations Organized by File

This section shows practices organized by file and field, aligning with the GTFS reference.

All Files

Topic Tags # Recommendation
Mixed Case 1 All customer-facing text strings (including stop names, route names, and headsigns) should use Mixed Case (not ALL CAPS), following local conventions for capitalization of place names on displays capable of displaying lower case characters.
Examples
Brighton Churchill Square
Villiers-sur-Marne
Market Street
Abbreviations 2 Avoid use of abbreviations throughout the feed for names and other text (e.g. St. for Street) unless a location is called by its abbreviated name (e.g. “JFK Airport”). Abbreviations may be problematic for accessibility by screen reader software and voice user interfaces. Consuming software can be engineered to reliably convert full words to abbreviations for display, but converting from abbreviations to full words is prone to more risk of error.

agency.txt

Field Name Tags # Recommendation
agency_id 1 Should be included, even if there is only one agency in the feed. (See also: recommendation to include agency_id in routes.txt and fare_attributes.txt)
agency_lang 2 Should be included
agency_phone 3 Should be included unless no such customer service phone exists.
agency_email 4 Should be included unless no such customer service email exists.
agency_fare_url 5 Should be included unless the agency is fully fare-free.

feed_info.txt

feed_info.txt should be included, with all fields below.

Field Name Tags # Recommendation
feed_publisher_name 1 Should be included
feed_publisher_url 2 Should be included
feed_lang 3 Should be included
feed_start_date & feed_end_date 4 Should be included
feed_version 5 Should be included
feed_contact_email & feed_contact_url 6 Provide at least one

stops.txt

Field Name Tags # Recommendation
stop_id 1 Stops that are in different physical locations (i.e., different designated precise locations for vehicles on designated routes to stop, potentially distinguished by signs, shelters, or other such public information, located on different street corners or representing different boarding facility such as a platform or bus bay, even if nearby each other) should have different stop_id.
2 stop_id is an internal ID, not intended to be shown to passengers.
3 Maintain consistent stop_id for the same stops across data iterations (see: Dataset Publishing & General Practices).
stop_name 4 The stop_name should match the agency's public name for the stop, station, or boarding facility, e.g. what is printed on a timetable, published online, and/or presented at the location.
5 When there is not a published stop name, follow consistent stop naming conventions throughout the feed.
6 Avoid use of abbreviations other than for places that are most commonly called by an abbreviated name. See Abbreviations (#2) under All Files.
7 Provide stop names in mixed case, following local conventions, as per recommendation for all customer-facing text fields.
8 By default, stop_name should not contain generic or redundant words like “Station” or “Stop”, but some edge cases are allowed:
  • When it is actually part of the name (Union Station, Central Station)
  • When the stop_name is too generic (such as if it is the name of the city). “Station”, “Terminal”, or other words make the meaning clear.
stop_lat & stop_lon 9 Stop locations should be as accurate possible. Stop locations should have an error of no more than four meters when compared to the actual stop position.
10 Stop locations should be placed very near to the pedestrian right of way where a passenger will board (i.e. correct side of the street).
11 If a stop location is shared across separate data feeds (i.e. two agencies use exactly the same stop / boarding facility), indicate the stop is shared by using the exact same stop_lat and stop_lon for both stops.
stop_code 12 stop_code should be included in GTFS if there are passenger-facing stop numbers or short identifiers.
parent_station & location_type 13 Many stations or terminals have multiple boarding facilities (depending on mode, they might be called a bus bay, platform, wharf, gate, or another term). In such cases, feed producers should describe stations, boarding facilities (also called child stops), and their relation.
  • The station or terminal should be defined as a record in stops.txt with location_type = 1.
  • Each boarding facility should be defined as a stop with location_type = 0. The parent_station field should reference the stop_id of the station the boarding facility is in.
14 When naming the station and child stops, set names that are well-recognized by riders, and can help riders to identify the station and boarding facility (bus bay, platform, wharf, gate, etc.).
Parent Station Name Child Stop Name
Chicago Union Station Chicago Union Station Platform 19
San Francisco Ferry Building Terminal San Francisco Ferry Building Terminal Gate E
Downtown Transit Center Downtown Transit Center Bay B

stop_times.txt

Field Name Tags # Recommendation
pickup_type & drop_off_type 1 Non-revenue (deadhead) trips that do not provide passenger service should be marked with pickup_type and drop_off_type value of 1 for all stop_times rows.
2 On revenue trips, internal “timing points” for monitoring operational performance and other places such as garages that a passenger cannot board should be marked with pickup_type = 1 (no pickup available) and drop_off_type = 1 (no drop off available).
timepoint 3 The timepoint field should be provided. It specifies which stop_times the operator will attempt to strictly adhere to (timepoint=1), and that other stop times are estimates (timepoint=0).
arrival_time & departure_time 4 arrival_time and departure_time fields should specify time values whenever possible, including non-binding estimated or interpolated times between timepoints.
stop_headsign 5 stop_headsign values override the trip_headsign (in trips.txt). stop_headsign values should be provided when the text displayed on the headsign changes during a trip. stop_headsign values do not “carry down” to subsequent stops, and therefore values must be repeated if the stop headsign remains the same. In general, headsign values should also correspond to signs in the stations. Examples:
In NYC, for the 2 going Southbound:
For stop_times.txt rows: Use stop_headsign value:
Until Manhattan is Reached Manhattan & Brooklyn
Until Downtown is Reached Downtown & Brooklyn
Until Brooklyn is Reached Brooklyn
Once Brooklyn is Reached Brooklyn (New Lots Av)
In Boston, for the Red Line going Southbound, for the Braintree branch:
For stop_times.txt rows: Use this stop_headsign Value:
Until Downtown is Reached Inbound to Braintree
Once Downtown is Reached Braintree
After Downtown Outbound to Braintree
In the above two cases, “Southbound” would mislead customers because it is not used in the station signs.
shape_dist_traveled 6 shape_dist_traveled must be provided for routes that have looping or inlining (the vehicle crosses or travels over the same portion of alignment in one trip). See: shapes.shape_dist_traveled recommendation.

Loop routes: Loop routes require special stop_times considerations. (See: Loop route case)

transfers.txt

transfers.transfer_type can be one of four values defined in the GTFS. These transfer_type definitions are quoted from the GTFS Specification below, in italics, with additional practice recommendations.

Field Name Tags # Recommendation
transfers.transfer_type 1 0 or (empty): This is a recommended transfer point between routes.
If there are multiple transfer opportunities that include a superior option (i.e. a transit center with additional amenities or a station with adjacent or connected boarding facilities/platforms), specify a recommended transfer point.
2 1: This is a timed transfer point between two routes. The departing vehicle is expected to wait for the arriving one, with sufficient time for a passenger to transfer between routes.
This transfer type overrides a required interval to reliably make transfers. As an example, Google Maps assumes that passengers need 3 minutes to safely make a transfer. Other applications may assume other defaults.
3 2: This transfer requires a minimum amount of time between arrival and departure to ensure a connection. The time required to transfer is specified by min_transfer_time. Specify minimum transfer time if there are obstructions or other factors which increase the time to travel between stops.
4 3: Transfers are not possible between routes at this location. Specify this value if transfers are not possible because of physical barriers, or if they are made unsafe or complicated by difficult road crossings or gaps in the pedestrian network.
5 If in-seat (block) transfers are allowed between trips, then the last stop of the arriving trip must be the same as the first stop of the departing trip.

trips.txt

  • See special case for loop routes: Loop routes are cases where trips start and end at the same stop, as opposed to linear routes, which have two distinct termini. Loop routes must be described following specific practices. See Loop route case below.
  • See special case for lasso routes: Lasso routes are a hybrid of linear and loop geometries, in which vehicles travel on a loop for only a portion of the route. Lasso routes must be described following specific practices. See Lasso route case below.
Field Name Tags # Recommendation
trip_headsign 1 Do not provide route names (matching route_short_name and route_long_name) in the trip_headsign or stop_headsign fields.
2 Should contain destination, direction, and/or other trip designation text shown on the headsign of the vehicle which may be used to distinguish amongst trips in a route. Consistency with direction information shown on the vehicle is the primary and overriding goal for determining headsigns supplied in GTFS datasets. Other information should be included only if it does not compromise this primary goal. If headsigns change during a trip, override trip_headsign with stop_times.stop_headsign. Below are recommendations for some possible cases.
Route Description Recommendation
2A. Destination-only Provide the terminus destination. e.g. "Transit Center", “Portland City Center”, or “Jantzen Beach”
2B. Destinations with waypoints <destination> via <waypoint> “Highgate via Charing Cross”. If waypoint(s) are removed from the headsign show to passengers after the vehicle passes those waypoints, use stop_times.stop_headsign to set an updated headsign.
2C. Regional placename with local stops If there will be multiple stops inside the city or borough of destination, use stop_times.stop_headsign once reaching the destination city.
2D. Direction-only Indicate using terms such as “Northbound”, “Inbound”, “Clockwise,” or similar directions.
2E. Direction with destination <direction> to <terminus name> e.g. “Southbound to San Jose”
2F. Direction with destination and waypoints <direction> via <waypoint> to <destination> (“Northbound via Charing Cross to Highgate”).
3 Do not begin a headsign with the words “To” or “Towards”.
direction_id 4 If trips on a route service opposite directions, distinguish these groups of trips with the direction_id field, using values 0 and 1.
5 Use values 0 and 1 consistently throughout the dataset. i.e.
  • If 1 = Outbound on the Red route, then 1 = Outbound on the Green route
  • If 1 = Northbound on Route X, then 1 = Northbound on Route Y
  • If 1 = clockwise on Route X then 1 = clockwise on Route Y.

frequencies.txt

Field Name Tags # Recommendation
1 Actual stop times are ignored for trips referenced by frequencies.txt; only travel time intervals between stops are significant for frequency-based trips. For clarity/human readability, it is recommended that the first stop time of a trip referenced in frequencies.txt should begin at 00:00:00 (first arrival_time value of 00:00:00).
block_id 2 Can be provided for frequency-based trips.

routes.txt

Field Name Tags # Recommendation
agency_id 1 Must be included if it is defined in agency.txt.
route_short_name 2 Include route_short_name if there is a brief service designation. This should be the commonly-known passenger name of the service, no longer than 12 characters.
route_long_name 3 The definition from Specification reference: This name is generally more descriptive than the route_short_name and will often include the route's destination or stop. At least one of route_short_name or route_long_name must be specified, or potentially both if appropriate. If the route does not have a long name, please specify a route_short_name and use an empty string as the value for this field. Examples of types of long names are below:
Primary Travel Path or Corridor
Route Name Form Agency
“N”/“Judah” route_short_name/
route_long_name
Muni, in San Francisco
"6"/"ML King Jr Blvd" route_short_name/
route_long_name
TriMet, in Portland, Or.
“6”/“Nation - Étoile” route_short_name/
route_long_name
RATP, in Paris France.
“U2”-“Pankow – Ruhleben” route_short_name-
route_long_name
BVG, in Berlin, Germany
Description of the Service
"Hartwell Area Shuttle"
4 route_long_name should not contain the route_short_name.
5 Include the full designation including a service identity when populating route_long_name. Examples:
Service Identity Recommendation Examples
"MAX Light Rail"
TriMet, in Portland, Oregon
The route_long_name should include the brand (MAX) and the specific route designation "MAX Red Line"
"MAX Blue Line"
"Rapid Ride"
ABQ Ride, in Albuquerque, New Mexico
The route_long_name should include the brand (Rapid Ride) and the specific route designation "Rapid Ride Red Line"
"Rapid Ride Blue Line"
route_id 6 All trips on a given named route should reference the same route_id.
  • Different directions of a route should not be separated into different route_id values.
  • Different spans of operation of a route should not be separated into different route_id values. i.e. do not create different records in routes.txt for “Downtown AM” and “Downtown PM” services).
7 If a route group includes distinctly named branches (e.g. 1A and 1B), follow recommendations in the route branches case to determine route_short_name and route_long_name.
route_color & route_text_color 8 Should be consistent with signage and printed and online customer information (and thus not included if they do not exist in other places).

shapes.txt (alignments)

Field Name Tags # Recommendation
1 Ideally, for alignments that are shared (i.e. in a case where Routes 1 and 2 operate on the same segment of roadway or track) then the shared portion of alignment should match exactly. This helps to facilitate high-quality transit cartography.
2 Alignments should follow the centerline of the right of way on which the vehicle travels. This could be either the centerline of the street if there are no designated lanes, or the centerline of the side of the roadway that travels in the direction the vehicle moves.
Alignments should not “jag” to a curb stop, platform, or boarding location.
shape_dist_traveled 3 Must be provided in both shapes.txt and stop_times.txt if an alignment includes looping or inlining (the vehicle crosses or travels over the same portion of alignment in one trip).
An Inlining Route
If a vehicle retraces or crosses the route alignment at points in the course of a trip, shape_dist_traveled is important to clarify how portions of the points in shapes.txt line up correspond with records in stop_times.txt.
4 The shape_dist_traveled field allows the agency to specify exactly how the stops in the stop_times.txt file fit into their respective shape. A common value to use for the shape_dist_traveled field is the distance from the beginning of the shape as traveled by the vehicle (think something like an odometer reading).
  • Route alignments (in shapes.txt) should be within 100 meters of stop locations which a trip serves.
  • Simplify alignments so that shapes.txt does not contain extraneous points (i.e. remove extra points on straight-line segments; see discussion of line simplification problem).

calendar.txt and calendar_dates.txt

Field Name Tags # Recommendation
1 calendar_dates.txt should only contain a limited number of exceptions to the schedule. Regularly-scheduled service should be configured using calendar.txt.
2 Including a calendar.service_name field can also increase the human readability of GTFS, although this is not adopted in the spec.

fare_rules.txt and fare_attributes.txt

Field Name Tags # Recommendation
1 agency_id should be included in fare_attributes.txt if it the field is included in agency.txt.
2 If a fare system cannot be accurately modeled, avoid further confusion and leave it blank.
3 Include fares (fare_attributes.txt and fare_rules.txt) and model them as accurately as possible. In edge cases where fares cannot be accurately modeled, the fare should be represented as more expensive rather than less expensive so customers will not attempt to board with insufficient fare. If the vast majority of fares cannot be modeled correctly, do not include fare information in the feed.

Practice Recommendations Organized by Case

This section covers particular cases with implications across files and fields.

Loop Routes

On loop routes, vehicles’ trips begin and end at the same location (sometimes a transit or transfer center). Vehicles usually operate continuously and allow passengers to stay onboard as the vehicle continues its loop.

Below: Loop route. The vehicle returns to the starting point in one trip. Some loop routes offer travel in one direction, and others in two directions.
A Loop Route
Field Name Tags # Recommendation
trips.trip_id 1 Model the complete round-trip for the loop with a single trip.
stop_times.stop_id 2 Include the first/last stop twice in stop_times.txt for the trip that is a loop. Example below.
trip_id stop_id stop_sequence
9000 101 1
9000 102 2
9000 103 3
9000 101 4
Often, a loop route may include first and last trips that do not travel the entire loop. Include these trips as well.
trips.direction_id 3 If loop operates in opposite directions (i.e. clockwise and counterclockwise), then designate direction_id as 0 or 1.
trips.block_id 4 Indicate continuous loop trips with the same block_id.

Lasso Routes

Lasso routes combine aspects of a loop route and directional route.

Below: Lasso routes are loop-routes from A to A via B with three sections:
  • straight section from A to B;
  • loop from and to B;
  • straight section from B to A.
A Lasso Route
Examples:
Subway Routes (Chicago)
Bus Suburb to Downtown Routes (St. Albert or Edmonton)
See CTA Brown Line (CTA Website and TransitFeeds)
Field Name Tags # Recommendation
trips.trip_id 1 The full extent of a “vehicle round-trip” (see illustration above) consists of travel from A to B to B and back to A. An entire vehicle round-trip may be expressed by:
  • A single trip_id value/record in trips.txt
  • Multiple trip_id values/records in trips.txt, with continuous travel indicated by block_id.
stop_times.stop_headsign 2 The stops along the A-B section will be passed through in both directions. stop_headsign facilitates distinguishing travel direction. Therefore, providing stop_headsign is recommended for these trips.
Examples:
"A via B"
"A"
Chicago Transit Authority's Purple Line
"Southbound to Loop"
"Northbound via Loop"
"Northbound to Linden"
Edmonton Transit Service Bus Lines, here the 39
"Rutherford"
"Century Park"
trip.trip_headsign 3 The trip headsign should be a global description of the trip, like displayed in the schedules. Could be “Linden to Linden via Loop” (Chicago example), or “A to A via B” (generic example).

Branches

Some routes may include branches. Alignment and stops are shared amongst these branches, but each also serves distinct stops and alignment sections. The relationship among branches may be indicated by route name(s), headsigns, and trip short name using the further guidelines below.

Below: Three potential configurations of route branches. Primary alignment is in black. Branch is colored gold.
Configurations of Route Branches
Field Name Tags # Recommendation
1 In naming branch routes, it is recommended to follow other passenger information materials. Below are descriptions and examples of two cases:
1A If timetables and on-street signage represent two distinctly named routes (e.g. 1A and 1B), then present this as such in the GTFS, using the route_short_name and/or route_long_name fields. Example: GoDurham Transit routes 2, 2A, and 2B demonstrate branched routes with deviations and extensions.
1B If agency-provided information describes branches as the same named route, then utilize the trips.trip_headsign, stop_times.stop_headsign, and/or trips.trip_short_name fields. Example: GoTriangle route 300 travels to different locations depending on the time of day. During peak commuter hours extra legs are added onto the standard route to accommodate workers entering and leaving the city.

About This Document

Objectives

The objectives of maintaining GTFS Best Practices is to:

  • Improve end-user customer experience in public transportation apps
  • Make it easier for software developers to deploy and scale applications, products, and services
  • Facilitate the use of GTFS in various application categories (beyond its original focus on trip planning)

How to propose or amend published GTFS Best Practices

GTFS applications and practice evolve, and so this document may need to be amended from time to time. To propose an amendment to this document, open a pull request in the GTFS Best Practices GitHub repository and advocate for the change. The GTFS Best Practices Working Group will meet quarterly to discuss and approve selected changes. Please send other questions or suggestions to gtfs@rmi.org.

GTFS Best Practices Working Group

The GTFS Best Practices Working Group consists of public transportation providers, developers of GTFS-consuming applications, consultants, and academic organizations to define common practices and expectations for GTFS data. The goals of this working group are to support greater interoperability of data data. To join the working group, email gtfs@rmi.org.

Members of this working group include:

The GTFS Best Practices Working Group is convened and facilitated by Rocky Mountain Institute (RMI)—an independent nonprofit founded in 1982—transforms global energy use to create a clean, prosperous, and secure low-carbon future. It engages businesses, communities, institutions, and entrepreneurs to accelerate the adoption of market-based solutions that cost-effectively shift from fossil fuels to efficiency and renewables. RMI has offices in Basalt and Boulder, Colorado; New York City; Washington, D.C.; and Beijing.