The original solution that Tecken replaces is symbols.m.o which was a
Heroku app that ran an Apache server that used proxy rewrites to
draw symbols from
Its rewrite rules contained two legacy solutions:
Uppercase the debug ID in the filename.
Support having specific product names (e.g.
firefox) prefixing the name of the symbol file.
The old symbol download server was using
was accessible only with
What It Is¶
Tecken Download’s primary use-case is to redirect requests for symbols to
their ultimate source which is S3. For example, with a
will return a
302 Found redirect to (at the time of writing)
This way, all configuration of S3 buckets is central to Tecken even if we decide to change to a different bucket or add/remove buckets.
The primary benefit of using Tecken Download instead of hitting S3 public URLs directly is that it’s just one URL to remember and Tecken Download can iterate over a list of S3 buckets. This makes it possible to upload symbols in multiple places but have them all accessible from one URL.
The other use-case of this is if you’re simply curious to see if a symbol
file exists. Simply make a
HEAD request instead of a
GET requests are logged and counted within Tecken. There is
a basic reporting option to extract ALL symbols that was requested
yesterday but couldn’t be found. But note that the format is quite
particular since it doesn’t report the third part of the URI. And
additionally it reports two extra possible query string parameters
code_id. So if you make a query like:
…yesterday, then request the CSV report at:
https://symbols.mozilla.org/missingsymbols.csv it will contain a CSV line like this:
The CSV report is actually ultimately to help the Socorro Processor which used to manage reporting symbols that can’t be found during processing. See https://bugzilla.mozilla.org/show_bug.cgi?id=1361809
It only yields missing symbols whose symbol ended with
.pdb and filename ended with
.sym (case insensitively).
The purpose of this is to get missing symbols that could be fetched
Microsoft on-the-fly Symbol Lookups¶
Under certain conditions, if a symbol can not be found in S3, we might
try to look it up from Microsoft’s download server
https://msdl.microsoft.com/download/symbols/) if the symbol file
.pdb and filename ends in
The HTTP error response code is still
404 but the response body will
Symbol Not Found Yet (instead of
Symbol Not Found).
The lookup is relatively expensive since it depends two a network calls
(to Microsoft’s server and potentially our S3 upload)
and various command line subprocesses (
so it’s important it runs in the background.
Note that this operation is cached for a limited time so if you ask for the same symbol within a short window of time, it does not start another attempt to download from Microsoft.
All symbols that turns out to not be found are cached by an in-memory cache. However, every time the filename is matched to potentially be downloaded from Microsoft the general symbol download cache is invalidated. Meaning you can do this:
$ curl https://symbols.mozilla.org/foo.pdb/HEXHEX/foo.sym ... 404 Symbol Not Found Yet $ curl https://symbols.mozilla.org/foo.pdb/HEXHEX/foo.sym ... 404 Symbol Not Found Yet $ sleep 3 # roughly assume the download + S3 upload takes less than 3 sec $ curl https://symbols.mozilla.org/foo.pdb/HEXHEX/foo.sym ... 302
This was the original implementation https://gist.github.com/luser/92d5bc88478665554898
We know with confidence users repeatedly query certain files that are never in our symbol stores. We can ignore them to suppress logging that they couldn’t be found.
Right now, this is maintained as a configurable blacklist but is hard
coded inside the
_ignore_symbol code in
This approach might change over time as we’re able to confidently identify more and more patterns that we know we can ignore.
File Extension Whitelist¶
When someone requests to download a symbol, as mentioned above, we have some ways to immediately decide that it’s a 404 Symbol Not Found without even bothering to ask the cache or S3.
As part of that, there is also a whitelist of file extensions that are the
only ones we should bother with. This list is maintained in
settings.DOWNLOAD_FILE_EXTENSIONS_WHITELIST (managed by the environment
DJANGO_DOWNLOAD_FILE_EXTENSIONS_WHITELIST) and this list is
found in the source code (
settings.py) and also visible on the home page
if you’re signed in as a superuser.
Download With Debug¶
To know how long it took to make a “download”, you can simply measure the time it takes to send the request to Tecken for a specific symbol. For example:
$ time curl https://symbols.mozilla.org/firefox.pdb/448794C699914DB8A8F9B9F88B98D7412/firefox.sym
Note, that will tell you the total time it took your computer to make the request to Tecken plus Tecken’s time to talk to S3.
If you want to know how long it took Tecken internally to talk to S3, you can add a header to your outgoing request. For example:
$ curl -v -H 'Debug: true' https://symbols.mozilla.org/firefox.pdb/448794C699914DB8A8F9B9F88B98D7412/firefox.sym
Then you’ll get a response header called
Debug-Time. In the
output it will look something like this:
< Debug-Time: 0.627500057220459
If that value is not present it’s because Django was not even able to
route your request to the code that talks to S3. It can also come back
Debug-Time: 0.0 which means the symbol is in a blacklist of
symbols that are immediately
404 Not Found based on filename pattern
Download Without Caching¶
Generally we can cache our work around S3 downloads quite aggressively since we tightly control the (only) input. Whenever a symbol archive file is uploaded, for every file within that we upload to S3 we also invalidate it from our cache. That means we can cache information about whether certain symbols exist in S3 or not quite long.
However, if you are debugging something or if you manually remove a symbol from S3 that control is “lost”. But there is a way to force the cache to be ignored. However, it only ignores looking in the cache. It will always update the cache.
To do this append
?_refresh to the URL. For example:
$ curl https://symbols.mozilla.org/foo.pdb/HEX/foo.sym ...302 Found... # Now suppose you delete the file manually from S3 in the AWS Console. # And without any delay do the curl again: $ curl https://symbols.mozilla.org/foo.pdb/HEX/foo.sym ...302 Found... # Same old "broken", which is wrong. # Avoid it by adding ?_refresh $ curl https://symbols.mozilla.org/foo.pdb/HEX/foo.sym?_refresh ...404 Symbol Not Found... # Now our cache will be updated. $ curl https://symbols.mozilla.org/foo.pdb/HEX/foo.sym ...404 Symbol Not Found...
By default, when you request to download a symbol, Tecken will iterate through a list of available S3 configurations. By default it’s only really one, the main S3 bucket for public symbols.
To download symbols that might be part of a Try build you have to pass an
optional query string key:
try. Or you can prefix the URL with
$ curl https://symbols.mozilla.org/tried.pdb/HEX/tried.sym ...404 Symbol Not Found... $ curl https://symbols.mozilla.org/tried.pdb/HEX/tried.sym?try ...302 Found... $ curl https://symbols.mozilla.org/try/tried.pdb/HEX/tried.sym ...302 Found...
What Tecken does is, if you pass
?try to the URL or use the
prefix, it takes the existing list of S3 configurations and
appends the S3 configuration for Try builds.
Note; symbols from Try builds is always tried last! So if there’s a known
foo.pdb/HEX/foo.sym and someone triggers a Try build
(which uploads its symbols) with the exact same name (and build ID) and
even if you use
the existing (non-Try build) symbol will be matched first.