API Documentation
The API enables a list of breached accounts (username and email) to be quickly searched via a RESTful service.
Overview
This is the current version of the API.
Index
- Overview
- Breaches
- Breached Passwords
- Pastes
- Further reading
Specifying the API version
Version 1 of the API is consumable only by specifying the API version in the URL.
Versioning via the URL
This method can easily be invoked directly by requesting the URL with an appropriate user agent string.
GET https://breached.me/api/v1/{service}/{parameter}
Specifying the user agent
Each request to the API must be accompanied by a user agent request header. Typically this should be the name of the app consuming the service. A missing user agent will result in an HTTP 403 response. A valid request would look like:
GET https://breached.me/api/v1/{service}/{parameter} user-agent: [your app name]
Getting all breaches for an account
The most common use of the API is to return a list of all breaches. The API takes a single parameter which is the account to be searched for. The account is not case sensitive and leading or trailing white spaces will be deleted. The account should always be URL encoded. This is an authenticated API and a BM API key must be given with the request.
GET https://haveibeenpwned.com/api/v3/breachedaccount/ {account}bm-api-key: [your key]
Getting all breached sites in the system
A “breach” is an instance when a system might be jeopardized by someone (aka attacker). For example, eBay was a breach, Canva was a breach, etc. It is possible to return the details of the breach in the system which currently stands at 487 breaches.
By version in URL:
GET https://breached.me/api/v1/breaches
Getting a single breached site
Sometimes, you just need to check a single breach that can be retrieved by the breach “name”. “Name” is a stable value which may or may not be the same as the “title” (which can change). See the breach model below for more information.
By version in URL:
Getting all data classes in the system
A “data class” is an attribute of a record compromised in a breach. For example, many breaches expose data classes such as “Email addresses” and “Passwords”. The values returned by this service are ordered alphabetically in a string array and will expand over time as new breaches expose previously unseen classes of data.
By version in URL:
The breached model
The breached model
Attribute | Type | Description |
---|---|---|
Name | string | A Pascal-cased name representing the breach and is unique across all breaches. This value does not change and may be used to name dependent assets (such as images) but must not be shown directly to end-users (see the “Title” attribute instead). |
Title | string | A descriptive title for the breach suitable for displaying to end-users. It’s unique across all breaches, but individual values may change in the future (i.e. if another breach occurs against an organization already in the system). If a permanent value is required to call the breach, refer to the “Name” attribute instead. |
Domain | string | The domain of the primary website where the breach happened. This may be used for identifying other assets and external systems on the site. |
BreachDate | date | The date (only) the breach originally happened (in ISO 8601 format). This is not always accurate — frequently, breaches are discovered and reported long after the original incident. This is just a guide. |
AddedDate | datetime | The (precise) date and time the breach was added to the system (in ISO 8601 format). |
ModifiedDate | datetime | The (precise) date and time the breach was revised (in ISO 8601 format). This will only vary from the AddedDate attribute if other attributes have changed or data is changed (i.e. additional data is identified and loaded). It is always either equal to or higher than the AddedDate attribute. |
PwnCount | integer | The total number of accounts loaded into the system. |
Description | string | An overview of the breach in HTML markup. This may include markup such as emphasis and strong tags as well as hyperlinks. |
DataClasses | string[] | This attribute describes the nature of the data compromised in the breach and contains an alphabetically ordered string array of impacted data classes. |
IsVerified | boolean | Indicates that the breach is considered unverified. An unverified breach may not have been hacked from the indicated website. An unverified breach is still loaded into BM when there’s sufficient confidence that a significant portion of the data is legitimate. |
IsFabricated | boolean | Indicates that the breach is considered fabricated. A fabricated breach is unlikely to have been hacked from the indicated website and usually contains a large amount of manufactured data. However, it still contains legitimate email addresses and asserts that the account owners were compromised in the alleged breach. |
IsSensitive | boolean | Indicates if the breach is considered sensitive. The public API will not return any accounts for a breach flagged as sensitive. |
IsRetired | boolean | Indicates if the breach has been retired. This data has been permanently removed and will not be returned by the API. |
IsSpamList | boolean | Indicates if the breach is considered a spam list. This flag has no impact on any other attributes but it means that the data has not come as a result of a security compromise. |
LogoPath | string | A URI that specifies where a logo for the breached service can be found. Logos should always be in PNG format. |
Sample breach response
All responses return breached models either in a collection (breaches for an account or all breaches in the system) or as a single item (retrieving a breach by name). When a collection is returned, it’s sorted alphabetically by the title of the breach.
{
“Name”:”Adobe”,
“Title”:”Adobe”,
“Domain”:”adobe.com”,
“BreachDate”:”2013-10-04″,
“AddedDate”:”2013-12-04T00:00Z”,
“ModifiedDate”:”2013-12-04T00:00Z”,
“PwnCount”:152445165,
“Description”:”In October 2013, 153 million Adobe accounts were breached with each containing an internal ID, username, email, encrypted password and a password hint in plain text. The password cryptography was poorly done and many were quickly resolved back to plain text. The unencrypted hints also disclosed much about the passwords adding further to the risk that hundreds of millions of Adobe customers already faced.”,
“DataClasses”:[“Email addresses”,”Password hints”,”Passwords”,”Usernames”],
“IsVerified”:True, “IsFabricated”:False,
“IsSensitive”:False, “IsRetired”:False,
“IsSpamList”:False,
“LogoPath”:”https://haveibeenpwned.com/Content/Images/PwnedLogos/Adobe.png” },
{
“Name”:”BattlefieldHeroes”,
“Title”:”Battlefield Heroes”,
“Domain”:”battlefieldheroes.com”,
“BreachDate”:”2011-06-26″,
“AddedDate”:”2014-01-23T13:10Z”,
“ModifiedDate”:”2014-01-23T13:10Z”,
“PwnCount”:530270,
“Description”:”In June 2011 as part of a final breached data dump, the hacker collective “LulzSec” obtained and released over half a million usernames and passwords from the game Battlefield Heroes. The passwords were stored as MD5 hashes with no salt and many were easily converted back to their plain text versions.”,
“DataClasses”:[“Passwords”,”Usernames”],
“IsVerified”:True,
“”IsFabricated”:False,
“IsSensitive”:False,
“IsRetired”:False,
“IsSpamList”:False,
“LogoPath”:”https://haveibeenpwned.com/Content/Images/PwnedLogos/ BattlefieldHeroes.png”
}
]
Breached Passwords overview
Breached Passwords are more than half a billion passwords that have previously been exposed in data breaches. The service is detailed in the launch blog post. The entire data set is both downloadable and searchable online via the Breached Passwords page.
Each password is stored as an SHA-1 hash of a UTF-8 encoded password. The downloadable source data delimits the full SHA-1 hash and the password count with a colon (:) and each line with a CRLF.
Searching by range
In order to protect the value of the source password being searched for, Breached Passwords also implements a k-Anonymity model that allows a password to be searched for by partial hash. This allows the first 5 characters of an SHA-1 password hash (not case-sensitive) to be passed to the API:
GET https://api.breachedpasswords.com/range/{first 5 hash chars}
When a password hash with the same first 5 characters is found in the Breached Passwords repository, the API will respond with an HTTP 200 and include the suffix of every hash beginning with the specified prefix, followed by a count of how many times it appears in the data set. The API consumer can then search the results of the response for the presence of their source hash and if not found, the password does not exist in the data set. A sample response for the hash prefix “21BD1” would be as follows:
0018A45C4D1DEF81644B54AB7F969B88D65:1
00D4F6E8FA6EECAD2A3AA415EEC418D38EC:2
011053FD0102E94D6AE2F8B83D76FAF94F6:1
012A7CA357541F0AC487871FEEC1891C49C:2
0136E006E24E7D152139815FB0FC6A50B15:2
A range search typically returns approximately 500 hash suffixes, although this number will differ depending on the hash prefix being searched for and will increase as more passwords are added. There are 1,048,576 different hash prefixes between 00000 and FFFFF (16^5) and every single one will return HTTP 200; there is no circumstance in which the API should return HTTP 404.
Code | Body | Description |
---|---|---|
200 | Hash suffixes counts | Ok — all password hashes beginning with the searched prefix are returned alongside prevalence counts |
Introducing padding
In order to further strengthen privacy, padding can be added to responses so if anyone was able to intercept encrypted responses to the API s/he can’t determine which hash prefix was searched for by observing the response size. You can enable padding by a request header and ensures that all responses contain between 800 and 1,000 results regardless of the number of hash suffixes returned by the service.
Code | Body | Description |
---|---|---|
Add-Padding | Add-Padding: true | Pads out responses to ensure all results contain a random number of records between 800 and 1,000. |
Note: Padded entries always have a password count of 0 and can be discarded once received.
Getting all pastes for an account
The API takes a single parameter which is the email address to be searched for. The email is not case sensitive and leading or trailing white spaces will be deleted. The email should always be URL encoded. This is an authenticated API and a BM API key must be given with the request.
GET https://breached.me/api/v1/dataclasses
bm-api-key: [your key]
The paste model
Each paste contains a number of attributes describing it. The current attributes are:
Attribute | Type | Description |
---|---|---|
Source | string | The paste service the record was retrieved from. Current values are: Pastebin, Pastie, Slexy, Ghostbin, QuickLeak, JustPaste, AdHocUrl, PermanentOptOut, OptOut |
Id | string | The ID of the paste as it was given at the source service. Combined with the “Source” attribute, this can be used to resolve the URL of the paste. |
Title | string | The title of the paste as observed on the source site. This may be null and if so will be omitted from the response. |
Date | date | The date and time (precision to the second) that the paste was posted. This is taken directly from the paste site when this information is available but may be null if no date is published. |
EmailCount | integer | The number of emails that were found when processing the paste. Emails are extracted by using the regular expression \b[a-zA-Z0-9\.\-_\+]+@[a-zA-Z0-9\.\-_]+\.[a-zA-Z]+\b |
Sample paste response
Searching an account for pastes always returns a collection of the pasted entity. The collection is sorted chronologically with the newest paste first.
{
“Source”:”Pastebin”,
“Id”:”8Q0BvKD8″,
“Title”:”syslog”,
“Date”:”2014-03-04T19:14:54Z”,
“EmailCount”:139
},
{
“Source”:”Pastie”,
“Id”:”7152479″,
“Date”:”2013-03-28T16:51:10Z”,
“EmailCount”:30
}
]
Cross-origin resource sharing (CORS)
CORS is only supported for non-authenticated APIs. When supported, it accepts all origins — you can hit the API from websites on any other domain.
HTTPS
All API endpoints must be invoked over HTTPS. Any requests over HTTP will result in a 301 response with a redirect to the same path on the secure scheme. Only TLS versions 1.2 and 1.3 are supported; older versions of the protocol will not allow a connection to be made.
Response codes
Semantic HTTP response codes are used to indicate the result of the search:
Code | Description |
---|---|
200 | Ok — everything worked and there’s a string array of pwned sites for the account |
400 | Bad request — the account does not comply with an acceptable format (i.e. it’s an empty string) |
401 | Unauthorized — either no API key was provided or it wasn’t valid |
403 | Forbidden — no user agent has been specified in the request |
404 | Not found — the account could not be found and has therefore not been pwned |
429 | Too many requests — the rate limit has been exceeded |
503 | Service unavailable — usually returned by Cloudflare if the underlying service is not available |
Abuse
There’s not much point; if you want to build up a treasure trove of breached email addresses or usernames, go and download the dumps (they’re usually just a Google search away) and save yourself the hassle and time of trying to enumerate an API one account at a time. With that, the use of the API should be within acceptable use.
Rate limiting
Requests to the breaches and pastes APIs are limited to one per every 1500 milliseconds each from any given BM API key (a key may request both APIs within this period). Any request that exceeds the limit will receive an HTTP 429 “Too many requests” response. The response also includes an accompanying “retry-after” response header expressing the number of seconds remaining before the IP address can make a successful API call (the value is rounded up to the next whole second).
The response body explains the rate limit and refers to the acceptable use of documentation.
retry-after: 2 { “statusCode”: 429, “message”: “Rate limit is exceeded. Try again in 2 seconds.” }
The retry period can be changed; attempting to query the API more aggressively than the allowable rate causes the retry period to start again with each failed request. It’s advisable to avoid querying the API at exactly the same rate limit as your network behavior as this may result in some requests arriving within the retry period and causing a 429. Adding an additional 100-millisecond delay between requests on top of the rate limit should normally ensure this won’t happen.
When the rate limit is consistently exceeded, further defenses may be employed to limit the ability to query the API. These defenses include blocks or JavaScript challenges by Cloudflare which may result in an HTTP 503 “Service Unavailable” response.
There is no rate limit for the Breached Passwords API.
Acceptable use
The API has been designed to make it easy for people to do awesome things with it. Things that are not awesome include:
- Querying the data for purposes that are intended to cause more harm to the victims of data breaches
- Anything deliberately intended to limit service availability such as the denial of service attacks
- Deliberate attempts to bypass measures designed to ensure acceptable use
- Improperly identifying the user agent such that it accurately describes the consumer of the API
- Misrepresenting the consuming client by impersonating other user agents in an attempt to confuse API requests
- Other services designed to fraudulently represent the Breached Me name or brand
- Misrepresenting the source of the data as originating from somewhere other than Breached Me
- Not adhering to the Creative Commons Attribution License as described below
- Automating the consumption of other APIs not explicitly documented on this page
- Using the service in a fashion that brings Breached Me into disrepute
License — breach & paste APIs
This work is licensed under a Creative Commons Attribution 4.0 International License.
In other words, you’re welcome to use the public API to build other services, but you must identify Breached Me as the source of the data. Clear and visible attribution with a link to breached.me should be present anywhere data from the service is used including when searching breaches or pastes and when representing breach descriptions. It doesn’t have to be obvious, but the interface of Breached Me should clearly attribute the source per the Creative Commons Attribution 4.0 International License.
In order to help maximize adoption, there are no licensing or attribution requirements on the Breached Passwords API, if you’d like one you can contact us.