AI/ML: Transparency and Choice
Learn about Sentry's approach to AI/ML
These settings will be rolling out to users over the next few weeks.
Throughout Sentry’s history, we’ve operated under a policy of privacy by default. This same principal applies to our work in the Artificial Intelligence (AI) and Machine Learning (ML) space, where we want to be just as transparent about what data we’re using and why.
Sentry is at a juncture where prior heuristics-based approaches cannot sustain the demands of the product. For example, fingerprinting error events as part of creating groups, has gotten a lot more complicated with the rise of JavaScript and the use of extensions and third-party services.
To train and validate models for grouping, notifications, and workflow improvements, Sentry will need access to additional service data to deliver a better user experience.
You can update these settings within the new “Service Data Usage” section of the Legal & Compliance page in Sentry, which is located within the “Usage & Billing” Settings.
In accordance with our Terms of Service, Sentry may use non-identifying elements of your service data for product improvement. For example, we may aggregate web vitals data to show your site’s performance against a Sentry-built benchmark. The data accessed for the benchmark cannot be linked back to any particular project or customer, making it non-identifying.
For upcoming features like priority alerts or ML-based grouping, Sentry is asking for access to the following forms of service data:
- Error messages
- Stack traces
- Spans
- DOM interactions
For upcoming features like Autofix that use Generative AI and Retrieval Augmented Generation (RAG), Sentry is asking for access to the following forms of service data:
- Stack traces
- Relevant code to linked repositories
All functionality leveraging RAG will require user opt-in. By opting in, you agree to send relevant stack trace and code from your linked repositories to third-party AI subprocessors, as disclosed in our subprocessor list.
To ensure our BAA customers can remain HIPAA compliant, we will disable generative AI features in Sentry for all BAA customers by default.
Access Type | Is the underlying data identifiable? | Will this data (or any output) be shared with others? | Will this data be used for training Sentry models? | Will this data be used to train 3rd party models? |
---|---|---|---|---|
Non-identifying data | No | Other Sentry customers | Yes | No |
Aggregated identifying data | Yes | Sentry only | Yes | No |
Identifying data for generative AI features | Yes | Approved AI subprocessors | No | No |
In addition to the consent mechanisms mentioned above:
- We'll continue to encourage all customers to use our various data scrubbing tools so that service data is sanitized before we receive it.
- We'll apply the same deletion and retention rules to our training data as we do to the underlying service data. This means that if you delete service data, it will also be removed from our machine learning models automatically.
- We'll scrub data for PII before it goes into any training set.
- We'll ensure that the only service data presented in the output of any ML feature belongs to the customer using the feature.
- We'll only use AI models built in-house or provided by our existing trusted third-party subprocessors who have made contractual commitments that are consistent with the above.
We're confident that with these controls in place, we'll be able to use service data to improve our products through AI while at the same time protecting that data.
Our documentation is open source and available on GitHub. Your contributions are welcome, whether fixing a typo (drat!) or suggesting an update ("yeah, this would be better").