Protecting Exposed APIs: Avoid Data Leaks with SlashID Gate and OPA

How did the Duolingo leak happen?

In March, Ivano Somaini wrote a tweet disclosing an unauthenticated Duolingo API as part of his Open Source Intelligence (OSINT) work.

The issue is pretty straightforward. A simple API call to the https://www.duolingo.com/2017-06-30/users?email endpoint reveals several private details about users and allows attackers to enumerate registered emails. Below an example output:

{
  "users": [
    {
      "joinedClassroomIds": [],
      "streak": 0,
      "motivation": "none",
      "acquisitionSurveyReason": "none",
      "shouldForceConnectPhoneNumber": false,
      "picture": "//simg-ssl.duolingo.com/avatar/default_2",
      "learningLanguage": "ru",
      "hasFacebookId": false,
      "shakeToReportEnabled": null,
      "liveOpsFeatures": [
        {
          "startTimestamp": 1693007940,
          "type": "TIMED_PRACTICE",
          "endTimestamp": 1693180740
        }
      ],
      "canUseModerationTools": false,
      "id": 184078602543312,
      "betaStatus": "INELIGIBLE",
      "hasGoogleId": false,
      "privacySettings": [],
      "fromLanguage": "en",
      "hasRecentActivity15": false,
      "_achievements": [],
      "observedClassroomIds": [],
      "username": "example",
      "bio": "",
      "profileCountry": "US",
      "chinaUserModerationRecords": [],
      "globalAmbassadorStatus": {},
      "currentCourseId": "DUOLINGO_RU_EN",
      "hasPhoneNumber": false,
      "creationDate": 146229322008,
      "achievements": [],
      "hasPlus": false,
      "name": "o",
      "roles": ["users"],
      "classroomLeaderboardsEnabled": false,
      "emailVerified": false,
      "courses": [
        {
          "preload": false,
          "placementTestAvailable": false,
          "authorId": "duolingo",
          "title": "Russian",
          "learningLanguage": "ru",
          "xp": 370,
          "healthEnabled": true,
          "fromLanguage": "en",
          "crowns": 7,
          "id": "DUOLINGO_RU_EN"
        }
      ],
      "totalXp": 370,
      "streakData": {
        "currentStreak": null
      }
    }
  ]
}

Armed with this API, an attacker published a dump of 2.6 million user records on VX-Underground.

This kind of incident is far from isolated, and Duolingo is just one of the many examples. In a similar incident in 2021, the “Add Friend” API allowed linking phone numbers to user accounts, costing Facebook over $275 million in fines from the Irish Data Protection Commission.

Introducing Gate

At SlashID, we believe that security begins with Identity. Gate is our identity-aware edge authorizer to protect APIs and workloads.

Gate can be used to monitor or enforce authentication, authorization and identity-based rate limiting on APIs and workloads, as well as to detect, anonymize, or block personally identifiable information (PII) exposed through your APIs or workloads.

Read on to learn how to deploy Gate to prevent data breaches like the ones mentioned above.

Deploying Gate

Gate can be deployed in multiple ways: as a sidecar for your service, as an external authorizer for Envoy, an ingress proxy or a plugin for your favorite API Gateway. See more in the Gate configuration docs.

For this toy example we chose a simple Docker Compose deployment, which looks like this:

version: '3.7'

services:
  backend:
    build: backend
    ports:
      - 8000:8000
    environment:
      - PORT=8000
    env_file:
      - envs/env.env
    restart: on-failure

  gate:
    image: slashid/gate:latest
    volumes:
      - ./gate.yaml:/gate/gate.yaml
    ports:
      - 8080:8080
    env_file:
      - envs/env.env
    command: --yaml /gate/gate.yaml
    restart: on-failure
    depends_on:
      - backend

The Docker Compose spawns two services: Gate and a toy backend.

Simulating the leaky API

Our toy backend contains a REST API that behaves similarly to the Duolingo one:

users = [
    {'email': 'test@example.com', 'name': 'Test User', 'id': 1},
    {'email': 'john@example.com', 'name': 'John Doe', 'id': 2},
    # ... add more users if needed
]

def get_user_by_email(email: str) -> Optional[dict]:
    for user in users:
        if user['email'] == email:
            return user
    return None

@app.get("/get_user/", tags=["business"])
async def read_user(email: str = Query(..., description="The email of the user to search for")):
    user = get_user_by_email(email)
    if user:
        return user
    else:
        raise HTTPException(status_code=404, detail="User not found")

Let’s test it:

curl 'http://gate:8080/get_user/?email=test@example.com' | jq
{
  "email": "test@example.com",
  "name": "Test User",
  "id": 1
}

Detecting PII data through Gate

Gate has a plugin-based architecture and we expose several built-in plugins. In particular, the PII Anonymizer plugin allows the detection and anonymization of PII or other sensitive data.

The PII Anonymizer plugin can be configured to exclusively monitor PII (as opposed to editing the traffic) by setting the anonymizers rule to keep. We’ll show an example in the next section.

Let’s see a simple Gate configuration that detects email addresses and rewrites the HTTP response to anonymize the field with a hash of the email address:

gate:
  port: 8080
  log:
    format: text
    level: info

  plugins_http_cache:
    - pattern: '*'
      cache_control_override: private, max-age=600, stale-while-revalidate=300

  plugins:
    - id: pii_anonymizer
      type: anonymizer
      enabled: false
      intercept: request_response
      parameters:
        anonymizers: |
          EMAIL_ADDRESS:
            type: hash

  urls:
    - pattern: '*/get_user'
      target: http://backend:8000
      plugins:
        pii_anonymizer:
          enabled: true

Let’s test it:

curl 'http://gate:8080/api/get_user/?email=test@example.com' | jq
{
  "email": "973dfe463ec85785f5f95af5ba3906eedb2d931c24e69824a89ea65dba4e813b",
  "id": 1,
  "name": "Test User"
}

Detecting PII and blocking the request with OPA

Note: similarly to the PII detection plugin, the OPA plugin can also be run in monitoring mode. See the end of the blogpost to find out more.

Sometimes hashing the request is not enough and you want to block it entirely, let’s see how to combine the PII detection plugin with the OPA plugin to detect and block requests containing PII data.

Note: In the examples below we embed the OPA policies directly in the Gate config but they can also be served through a bundle, please check out our documentation to learn more about the plugin.

gate:
  port: 8080
  log:
    format: text
    level: info

  plugins_http_cache:
    - pattern: '*'
      cache_control_override: private, max-age=600, stale-while-revalidate=300

  plugins:
    - id: authz_deny_pii
      type: opa
      enabled: false
      intercept: response
      parameters:
        <<: *slashid_config
        policy_decision_path: /authz/allow
        policy: |
          package authz

          import future.keywords.if

          default allow := false

          no_key_found(obj, key) {
            not obj[key]
          }

          allow if no_key_found(input.response.http.headers,  "X-Gate-Anonymize-1")

    - id: pii_anonymizer
      type: anonymizer
      enabled: false
      intercept: request_response
      parameters:
        anonymizers: |
          DEFAULT:
            type: keep
  urls:
    - pattern: '*/get_user'
      target: http://backend:8000
      plugins:
        pii_anonymizer:
          enabled: true
        authz_deny_pii:
          enabled: true

The authz_deny_pii instance of the OPA plugin enforces an OPA policy that blocks a request if the response contains a X-Gate-Anonymize-1. This is a header added by the PII detection plugin to notify of the presence of PII.

Let’s see it in action:

/usr/server/app $ curl --verbose 'http://gate:8080/api/get_user/?email=test@example.com' | jq
* processing: http://gate:8080/api/get_user/?email=test@example.com
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 172.27.0.5:8080...
* Connected to gate (172.27.0.5) port 8080
> GET /api/get_user/?email=test@example.com HTTP/1.1
> Host: gate:8080
> User-Agent: curl/8.2.1
> Accept: */*
>
< HTTP/1.1 403 Forbidden
< Cache-Control: private, max-age=600, stale-while-revalidate=300
< Content-Length: 0
< Content-Type: application/json
< Date: Sat, 02 Sep 2023 13:58:00 GMT
< Server: uvicorn
< Via: 1.0 gate
< X-Gate-Anonymize-1: $.body.email 0 64 EMAIL_ADDRESS
<
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
* Connection #0 to host gate left intact

Note that in this example pii_anonymizer is set to monitoring mode: type: keep for all PII types (DEFAULT). The plugin allows PII to pass through unchanged, without replacing it with an anonymized version of the data or changing the traffic in any way.

- id: pii_anonymizer
  type: anonymizer
  enabled: false
  intercept: request_response
  parameters:
    anonymizers: |
      DEFAULT:
        type: keep

Differential policy enforcement for authenticated users

Let’s now enforce a new OPA policy that blocks requests containing PII only if the user is not authenticated, while allowing PII in requests of authenticated users.

For simplicity, in this example we’ll use SlashID Access to handle authentication, but any Identity Provider (IdP) would be suitable.

gate:
  port: 8080
  log:
    format: text
    level: info

  plugins_http_cache:
    - pattern: '*'
      cache_control_override: private, max-age=600, stale-while-revalidate=300

  plugins:
    - id: authz_allow_if_authed_pii
      type: opa
      enabled: false
      intercept: response
      parameters:
        <<: *slashid_config
        policy_decision_path: /authz/allow
        policy: |
          package authz

          import future.keywords.if

          default allow := false

          key_found(obj, key) if { obj[key] }

          jwks_request := http.send({
              "cache": true,
              "method": "GET",
              "url": "https://api.slashid.com/.well-known/jwks.json"
          })
          valid_signature if io.jwt.verify_rs256(input.request.token, jwks_request.raw_body)

          allow if not key_found(input.response.http.headers, "X-Gate-Anonymize-1")
          allow if valid_signature

    - id: pii_anonymizer
      type: anonymizer
      enabled: false
      intercept: request_response
      parameters:
        anonymizers: |

          DEFAULT:
            type: keep
  urls:
    - pattern: '*/get_user'
      target: http://backend:8000
      plugins:
        pii_anonymizer:
          enabled: true
        authz_deny_pii:
          enabled: true

This rule is a bit more complicated, let’s see what happens step by step.

First, we retrieve the JSON Web Key Set (JWKS) from https://api.slashid.com/.well-known/jwks.json.
Later, we check that either the incoming authorization token has a valid RS256 signature signed by SlashID or that X-Gate-Anonymize-1 is not present.
If either condition is true, the request is allowed. Let’s see this in action:

curl --verbose -L 'http://gate:8080/api/get_user/?email=test@example.com' | jq
* processing: http://gate:8080/api/get_user/?email=test@example.com
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 172.27.0.5:8080...
* Connected to gate (172.27.0.5) port 8080
> GET /api/get_user/?email=test@example.com HTTP/1.1
> Host: gate:8080
> User-Agent: curl/8.2.1
> Accept: */*
>
< HTTP/1.1 403 Forbidden
< Cache-Control: private, max-age=600, stale-while-revalidate=300
< Content-Length: 0
< Content-Type: application/json
< Date: Sat, 02 Sep 2023 16:04:24 GMT
< Server: uvicorn
< Via: 1.0 gate
< X-Gate-Anonymize-1: $.body.email 0 64 EMAIL_ADDRESS
<
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
* Connection #0 to host gate left intact

The request above is blocked because there is PII in the response and no valid JWT has been provided.

Let’s send a request with a valid token:

curl -H "Authorization: Bearer <TOKEN>" 'http://gate:8080/api/get_user/?email=test@example.com' | jq
{
  "email": "test@example.com",
  "id": 1,
  "name": "Test User"
}

Note in this case that we configured the PII plugin to alert of PII presence but not to replace or obfuscate it in any way, hence why we see the original clear-text response.

Depending on the IdP you are using, it is also possible to create more complex policies that not only check the validity of the identity token, but also examine specific properties of the token. (Look out for our next Gate blogpost for a deeper dive into this topic!)

Blocking requests to unknown URLs

More often than not, companies don’t really know which APIs are exposed to begin with. Gate can help in this scenario too.

Gate plugin instances can be applied to all routes, or you can select specific routes. In the example config below we enable the PII and OPA plugin instances on all routes and selectively disable them on specific routes:


gate:
  port: 8080
  log:
    format: text
    level: info

  plugins_http_cache:
    - pattern: "*"
      cache_control_override: private, max-age=600, stale-while-revalidate=300

  plugins:
    - id: authz_allow_if_authed_pii
      type: opa
      enabled: true
      intercept: response
      parameters:
        <<: *slashid_config
        policy_decision_path: /authz/allow
        policy: |
          package authz

          import future.keywords.if

          default allow := false

          key_found(obj, key) if { obj[key] }

          jwks_request := http.send({
              "cache": true,
              "method": "GET",
              "url": "https://api.slashid.com/.well-known/jwks.json"
          })
          valid_signature if io.jwt.verify_rs256(input.request.token, jwks_request.raw_body)

          allow if not key_found(input.response.http.headers, "X-Gate-Anonymize-1")
          allow if valid_signature
    - id: pii_anonymizer
      type: anonymizer
      enabled: true
      intercept: request_response
      parameters:
        anonymizers: |
          DEFAULT:
            type: keep

  urls:

    - pattern: "*/api/echo"
      target: http://backend:8000
      plugins:
        authz_allow_if_authed_pii:
          enabled: false
        pii_anonymizer:
          enabled: false

    - pattern: "*"
      target: http://backend:8000

Note how the plugins are defined as enabled by default and how in the URLs we explicitly disable the plugins on selected paths (e.g. "*/api/echo").

/usr/server/app $ curl --verbose -X POST 'http://gate:8080/api/echo' -d "email=abc@abc.com" | jq
Note: Unnecessary use of -X or --request, POST is already inferred.
* processing: http://gate:8080/api/echo
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 172.27.0.5:8080...
* Connected to gate (172.27.0.5) port 8080
> POST /api/echo HTTP/1.1
> Host: gate:8080
> User-Agent: curl/8.2.1
> Accept: */*
> Content-Length: 17
> Content-Type: application/x-www-form-urlencoded
>
} [17 bytes data]
< HTTP/1.1 200 OK
< Cache-Control: private, max-age=600, stale-while-revalidate=300
< Content-Length: 360
< Content-Type: application/json
< Date: Sun, 03 Sep 2023 09:30:38 GMT
< Server: uvicorn
< Via: 1.0 gate
<
{ [360 bytes data]
100   377  100   360  100    17  32933   1555 --:--:-- --:--:-- --:--:-- 37700
* Connection #0 to host gate left intact
{
  "method": "POST",
  "headers": {
    "host": "backend:8000",
    "user-agent": "curl/8.2.1",
    "content-length": "17",
    "accept": "*/*",
    "content-type": "application/x-www-form-urlencoded",
    "x-b3-sampled": "1",
    "x-b3-spanid": "39b9a26c103c6b5d",
    "x-b3-traceid": "ce0b56fc209ec47fbe0496606595c06b",
    "accept-encoding": "gzip"
  },
  "url": "http://backend:8000/api/echo",
  "body": {
    "email": "abc@abc.com"
  }
}
/usr/server/app $ curl --verbose 'http://gate:8080/api/get_user/?email=test@example.com' | jq
* processing: http://gate:8080/api/get_user/?email=test@example.com
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 172.27.0.5:8080...
* Connected to gate (172.27.0.5) port 8080
> GET /api/get_user/?email=test@example.com HTTP/1.1
> Host: gate:8080
> User-Agent: curl/8.2.1
> Accept: */*
>
< HTTP/1.1 403 Forbidden
< Cache-Control: private, max-age=600, stale-while-revalidate=300
< Content-Length: 0
< Content-Type: application/json
< Date: Sun, 03 Sep 2023 09:31:37 GMT
< Server: uvicorn
< Via: 1.0 gate
< X-Gate-Anonymize-1: $.body.email 0 16 EMAIL_ADDRESS
<
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
* Connection #0 to host gate left intact
/usr/server/app $

Running in monitoring mode

Just like the PII detection plugin, the OPA plugin also supports monitoring mode by adding monitoring_mode: true in its parameters as shown below:

    - id: authz_allow_if_authed_pii
      type: opa
      enabled: true
      intercept: response
      parameters:
        <<: *slashid_config
        monitoring_mode: true
        policy_decision_path: /authz/allow
        policy: |
          package authz

          import future.keywords.if

          default allow := false

          key_found(obj, key) if { obj[key] }

          jwks_request := http.send({
              "cache": true,
              "method": "GET",
              "url": "https://api.slashid.com/.well-known/jwks.json"
          })
          valid_signature if io.jwt.verify_rs256(input.request.token, jwks_request.raw_body)

          allow if not key_found(input.response.http.headers, "X-Gate-Anonymize-1")
          allow if valid_signature

Let’s send a request with an invalid token:

curl -H "Authorization: Bearer abc" 'http://gate:8080/api/get_user/?email=test@example.com' | jq
{
  "email": "test@example.com",
  "id": 1,
  "name": "Test User"
}

The request passes but Gate logs the policy violation:

gate-demo-gate-1        | time=2023-09-04T13:37:06Z level=info msg=OPA decision: false decision_id=d9b20a8d-da43-4786-ae15-1ec91199786d decision_provenance={0.55.0 19fc439d01c8d667b128606390ad2cb9ded04b16-dirty 2023-09-02T15:18:29Z   map[gate:{}]} plugin=opa req_path=/api/get_user/ request_host=gate:8080 request_url=/api/get_user/?email=test%40example.com

Performance

Performance is key when intercepting and modifying network traffic, our plugins were built for high performance in mind. For instance we embed an optimized version a rego interpreter vs standing up a separate OPA server.

Let’s look at a simple benchmark to see the impact of the two plugins on the network traffic.

Here’s a simple benchmarking script:

#!/bin/sh
iterations=$1
url=$2

echo "Running $iterations iterations for curl $url"
totaltime=0.0

for run in $(seq 1 $iterations)
do
 time=$(curl $url \
    -s -o /dev/null -w "%{time_total}")
 totaltime=$(echo "$totaltime" + "$time" | bc)
done

avgtimeMs=$(echo "scale=4; 1000*$totaltime/$iterations" | bc)

echo "Averaged $avgtimeMs ms in $iterations iterations"

In our demo, a request without any interception results in the following:

/usr/server/app $ ./benchmark.sh 10000 'http://gate:8080/api/get_user/?email=test@example.com'
Running 10000 iterations for curl http://gate:8080/api/get_user/?email=test@example.com
Averaged 1.1820 ms in 10000 iterations
/usr/server/app $

When we enable PII detection and rewriting (hashing of the email address) coupled with our caching plugin:

/usr/server/app $ ./benchmark.sh 10000 'http://gate:8080/api/get_user/?email=test@example.com'
Running 10000 iterations for curl http://gate:8080/api/get_user/?email=test@example.com
Averaged 1.5955 ms in 10000 iterations
/usr/server/app $

Next, we test PII detection in monitoring mode:

/usr/server/app $ ./benchmark.sh 10000 'http://gate:8080/api/get_user/?email=test@example.com'
Running 10000 iterations for curl http://gate:8080/api/get_user/?email=test@example.com
Averaged 1.5176 ms in 10000 iterations
/usr/server/app $

Last, let’s run PII detection in monitoring mode coupled with OPA like we did in the example in the previous section:

/usr/server/app $ ./benchmark.sh 10000 'http://gate:8080/api/get_user/?email=test@example.com'
Running 10000 iterations for curl http://gate:8080/api/get_user/?email=test@example.com
Averaged 1.8532 ms in 10000 iterations
/usr/server/app $

Thanks to a combination of our caching plugin and Gate’s own architecture, the average overhead in our toy application is 0.6712 ms when both OPA and PII detections are turned on.

Conclusion

In this blogpost we’ve shown how you can combine the Gate PII and OPA plugins to easily detect and prevent PII leakage.

We’d love to hear any feedback you may have! Try out Gate with a free account. If you’d like to use the PII detection plugin, please contact us at at contact@slashid.dev!

Protecting Exposed APIs: Avoid Data Leaks with SlashID Gate and OPA

Adequately protecting APIs is key to avoid data leaks and breaches.

Just recently, an exposed API allowed an attacker to scrape over 2.6 million records from Duolingo.

In this article, we’ll show how you can use Gate to detect, respond to, and prevent these kinds of incidents.

How did the Duolingo leak happen?

Introducing Gate

Deploying Gate

Simulating the leaky API

Detecting PII data through Gate

Detecting PII and blocking the request with OPA

Differential policy enforcement for authenticated users

Blocking requests to unknown URLs

Running in monitoring mode

Performance

Conclusion

Found this article useful to your project?

Secure your identities

Protecting Exposed APIs: Avoid Data Leaks with SlashID Gate and OPA

Adequately protecting APIs is key to avoid data leaks and breaches. Just recently, an exposed API allowed an attacker to scrape over 2.6 million records from Duolingo. In this article, we’ll show how you can use Gate to detect, respond to, and prevent these kinds of incidents.

How did the Duolingo leak happen?

Introducing Gate

Deploying Gate

Simulating the leaky API

Detecting PII data through Gate

Detecting PII and blocking the request with OPA

Differential policy enforcement for authenticated users

Blocking requests to unknown URLs

Running in monitoring mode

Performance

Conclusion

Found this article useful to your project?

Related articles

Rate Limiting for Large-scale, Distributed Applications and APIs Using GCRA

Context-aware authentication: fight identity fraud and qualify your users

No-code anti-phishing protection of internal apps with Passkeys

Secure your identities

Adequately protecting APIs is key to avoid data leaks and breaches.

Just recently, an exposed API allowed an attacker to scrape over 2.6 million records from Duolingo.

In this article, we’ll show how you can use Gate to detect, respond to, and prevent these kinds of incidents.