database and frontend for results storage
Apr 11 2017
Mar 21 2017
Production resultsdb seems to work better now, so lowering the priority. However, since we can't track how many tasks failed to post results (since we don't fail them if they do so), it's hard to guess how much uptime do we actually have. Maybe we should start failing jobs if resultsdb responds 5xx? Or retry a few times and fail eventually?
Mar 17 2017
Googlebot stopped hammering us, new robots.txt work. They are present in ansible now. However, it didn't solve the problems, resultsdb (and execdb) is down again.
Mar 16 2017
I have deployed new robots.txt: https://taskotron.fedoraproject.org/robots.txt
We should see whether it helps in a day or so (once Google refreshes that file). Btw, these are the current per-hour access numbers from Google:
So about 3300 hits per hour on overage, 55 per minute. They are performed in bursts, though (ten or twenty simultaneous requests, then a pause, then again).
Mar 15 2017
A short term patch could be to increase the number of processes allocated to the wsgi application.
Mar 13 2017
Feb 27 2017
I think this changed quite a bit with the inception of resultsdb_conventions. Any working on this should probably study that project first.
Feb 21 2017
@mkrizek deployed the new updates. Seems to be working. Closing.
Feb 20 2017
I submitted the new packages to stable. I assume we'll update it on prod tomorrow, and if nothing breaks, I'll close this.
Are there issues, still, or are we waiting for the changes to be deployed on all the instances? Dev seems to be working reasonably well for me.
Feb 15 2017
Feb 10 2017
@adamwill absolutely just a courtesy - we are still talking conventions (like "you should provide an item" style convention, maybe, but still a convention), not hard rules, at least in the scope of the actual implementation.
Also, even if this was "the true way", I'd still see a reason to have "I don't care, and just want a random UUID" thing, and the UUID is also a nice identifier, as it has constant length and such things that only really matter for the machines, and humans don't care, but are important anyway.
Good thing about the actual implementation is, that there won't be collisions between the "random" and "specific" UUIDs by definition of how UUIDs work (different namespaces), so you don't even need to be concerned about "random group results" mixing up with the "proper" ones.
Sure, it makes sense to me. I guess the only question is, does this become the One True Way of creating and identifying groups in resultsdb, or is it merely a courtesy feature in the resultsdb core for the benefit of things that respect this particular convention for naming groups and creating group UUIDs?
Feb 8 2017
Well, not a 'problem', no, not exactly. I mean, it's actually one of the things resultsdb_conventions is *specifically intended to achieve*: it allows (almost requires) different systems to report results to at least some of the same group(s). It establishes the convention 'all test results for a compose should go into a group named for that compose', and *any* reporter that buys into the conventions 'system' and uses the 'compose' convention (or a child of it) will put its results into that group. It's really almost the opposite of what you're talking about: conventions is explicitly a system for *enabling* different systems/submitters to collate / relate related results (in groups, and also by having similarly formatted extradata). Further, at least in my head, the conventions aren't tied to resultsdb_conventions-the-codebase, that's just a current implementation detail; it seems entirely reasonable to me that someone might build another submitter that doesn't actually use resultsdb_conventions (write it in Go or whatever you like), but does respect the same *conventions*.
I understand now why you want to be able to query by group name, have them unique, and use consistent and predictable naming. And it makes sense. Originally we never imagined such use case, I believe, and simply used it to group results from a single task run together (which is sometimes useful when a human inspects the results, that's all). And waited for more use cases (Josef, correct me if I'm wrong). The way you want to use groups is exactly such a new use case, and I would personally probably have solved it with extradata, if I hadn't read this. But using groups can be a better way to tackle this, and maybe even less demanding on the database (Josef?).
Feb 6 2017
I was going to have it done already, but couldn't while my web server was down. I'll write one today or tomorrow.
Feb 3 2017
We need a blogpost about this new awesome feature! I see 4 volunteers:
I don't really care who does it, we can even pick randomly!
Once this is fixed, please also enable automatic test suite execution in Phab, so that we don't encounter this again in the future. See D1113 to learn how.
Feb 2 2017
Should be fixed in d39eadc6f8435e2d9.
The changes are in git. Assigning to kparal, so he can close it once packages are built (or assign to someone who'll be responsible for deploying the updated packages).
Fixed by D1089
Feb 1 2017
There's still something wrong, all the testcases from https://taskotron.fedoraproject.org/resultsdb/testcases link to https://taskotron.fedoraproject.org/resultsdb/testcases (the URL). The same happens with dev. Is that a different bug?
Also D1107, pls.
The problem was in http(s) mixed content - the cdn used for the typeahead was serverd over http, thus shutting down the whole search function.
[mkrizek@resultsdb01 ~][PROD]$ rpm -q resultsdb_frontend resultsdb python-resultsdb_api resultsdb_frontend-1.2.0-1.fc24.noarch resultsdb-2.0.2-1.fc24.noarch python-resultsdb_api-1.3.0-1.fc24.noarch
Jan 31 2017
D1102 is addressing this, it's almost baked, should be done tomorrow.
Closed by https://phab.qa.fedoraproject.org/D1019