Page 2 of 2

Re: Xbox dump differences

Posted: Sun Apr 08, 2018 3:53 pm
by limbo43
Still working on figuring this out. I used the same drive to re-dump several of the affected games and they then dumped correctly. Further analysis of my bash history reveals that most of these bad games were dumped back-to-back; I think the drive or laser was in an odd state that persisted across discs.

One suspicion I have is that maybe the Kreon firmware's "error skipping" feature could be a culprit, but I don't know exactly what it does. If it dumps out zero-bytes for what seem to be bad sectors instead of failing, that might be why. FreeCell does not send the cdb command to enable or disable Kreon's error skipping feature so if it's on by default, that could be a problem. According to the NFO the error skipping feature is enabled by default for "360 games" but maybe is on for both? If certain errors that _should_ bubble up do not, FreeCell can have silently corrupted dumps. So if this ends up being the culprit then FreeCell will need a patch to issue that cdb. (It's also possible this has nothing to do with anything and the error skipping feature isn't on by default)

So far I have not reproduced the error condition--every dump I do with the "bad" drive is correct. I've tried continuously looping multiple dumps on all 3 drives at the same time to simulate the conditions and have come up short. It's driving me crazy that I can't get it to happen again.

Affected dumps may be harder to detect than I thought. The most obvious ones just have all zeroes for 2-6 sectors immediately following the layerbreak, but there are some examples where there is a random perforation of zeroes in those sectors instead. This means it may be nearly impossible to detect good vs. bad dumps without a true redump by multiple people. Since we know I'm not the only affected user (h0lylag's NFS dump comes to mind) this could mean that there's a risk to any non-redumped Xbox title in the database today. Scary thought.

Still trying to repro and will update once I have more information

Re: Xbox dump differences

Posted: Sun Apr 08, 2018 5:01 pm
by reentrant
You could do a test. Scratch the disc and try to read it...

Re: Xbox dump differences

Posted: Sun Apr 08, 2018 6:04 pm
by limbo43
reentrant wrote:You could do a test. Scratch the disc and try to read it...
I did try editing sectors.txt to intentionally not provide a few SS areas to see what the drive would do. FreeCell continued to function and the drive returned what appeared to be random data for those sectors. Not sure if the same issue though

Re: Xbox dump differences

Posted: Mon Apr 16, 2018 3:33 am
by limbo43
I think I've gotten closer to the root cause. After redumping the 26 games that had this problem, I noticed that the affected drive begins failing the same way after being used continuously for a certain amount of time. It seems like it's an overheating or mechanical stress issue that affects the drive in such a way that the laser is not refocusing on the second layer fast enough at the layerbreak. As a result the drive is reading zeroes when it should read data. I don't know enough about the drive internals and hardware debugging to completely isolate the issue, but I found that letting the drive cool off is enough to get another clean dump, and if I do a lot of dumps back-to-back it eventually fails consistently.

Therefore, I'm trashing this drive, but we now know what one symptom of a bad drive looks like that is not detected by existing utilities. We have only found one dump so far besides my own that shows this issue, and I already redumped that game, but I will do a deeper dive soon.

I am posting a fix thread now with new checksums for all of the affected games:
/viewtopic.php?p=42778#p42778

Re: Xbox dump differences

Posted: Mon Apr 16, 2018 3:45 am
by limbo43
By the way, I wrote something to check the feature set of my drives with Kreon and the "error correction" stuff was explicitly not available on my model. Furthermore I found some posts by Kreon himself explaining that the error correction is just shortening how long the drive will lock up/retry internally before unblocking and returning a sense error. So that wasn't related at all