This document specifies a list of evaluation/comparison strategies for some often encountered purposes. While they may be imperfect, empirical and not a product of exact science, i have used them myself, throughout my life, to evaluate, compare and decide on similar things that i could not immediately or intuitively judge. The generic approach would be to find sufficient significant aspects that can be scored. Significance is hard to define and sometimes subjective, so the best definition of it is probably anything that you don't find trivial.
Situations may come up where someone wants to compare a hotel H1 with, let's say, an R1 rating of 4.5 stars and N1=4000 reviews (on Google Maps or Tripadvisor for example) with another hotel H2 that has a rating R2=4.4 stars and N2=5000 reviews. At the first glance one could say H1 has a better rating simply because R1>R2. But if you think about it, the more reviews a reviewable entity has, the more it gives people an opportunity to detract from a perfect 5 star rating. Let's call this a rating loss (L) and let's calculate which hotel from our example is losing rating faster than the other. L1=(5-R1)/N1 (where 5 is the maximum rating, usually). This means L1=(5-4.5)/4000=0.5/4000=0.000125. Now let's calculate L2=(5-R2)/N2=(5-4.4)/5000=0.6/5000=0.00012. As you can see L1>L2 which means H1 is losing rating "faster" than H2. In other words, hotel H2 lost less rating per reviewer than hotel H1; therefore L seems to be a more relevant rating than R, but will L it's the other way around: the smaller the better. Sometimes the margins are small, but, also, sometimes, decisions are made by small margins when other criteria don't show a clear winner. Of course, this method might not be perfect and also, it's linearity might not be entirely fitting, as i tend to believe rating loss tends more towards an exponential decrease rather than a linear decrease the more reviewers there are. But it's just a tool. One other disadvantage is: you can not apply this when the number of reviewers is relatively small, the results would be skewed (e.g. if R1=5 with N1=10 this would beat basically any R2<5 even if R2=4.9 and N2 is astronomically big). One advantage this tool can have is this can be used (as long as you are willing to invest the calculation effort) for anything that can be rated or reviewed, from hotels to restaurants on Google Maps or Tripadvisors, to places to stay on Booking.com or Airbnb, apps on Google Play Store or Apple Store and the list can go on.
2.1 Functional transparency
- Available functionality is represented on the watch dial/face
2.2 Easy maintenance
2.3 Comfortable strap/band
- Doesn't pull on hair/skin
- The buckle/fastener is not sharp on the skin
- The buckle/fastener can be set to its closed position easily/correctly
- Matches the hand / Easily customizable in length
2.4 Comfortable watch body/bezel
- Stability / Control
- Doesn't uncomfortably/incorrectly press on the skin/hand
- Weight is proportional with the amount of relevant functionality
- Matches the hand / Easily customizable in length
2.5 Original watch
- Certainty (or high probability) that the watch is an original
Examples for some arbitrary & subjective evaluations:
- Score(Fossil FB01)=1+0.25+1+1+1+0.75+1+1+0.75+1+1=9.75
- Score(Casio A163W)=1+1+0+1+0.75+1+0.5+1+1+1+1=9.25
- Score(Xiaomi Mi Band 5)=1+0.75+1+1+0+1+1+0+1+1=8.75
3.1 Grip
- How much its form, bezel shape and other shape/angle/size-related factors allow for grip
- How much material and texture allow for grip
- Weight is relatively and acceptably balanced (as opposed to too top-heavy, too bottom-heavy, etc)
3.2 Input & Output
- Has a USB-C port
- Has 3.5mm jack
- Has stereo speakers
- How clicky buttons
- Has OLED (or OLED-like) technology that completely powers off dark pixels
- Has an acceptable form of OIS (optical image stabilization) the best of which is probably hardware
3.3 Power consumption & Feature balance
Nowadays most phones have some battery, some processing power and decent screens out of the box though it's all relative to multiple factors. For example, some people might want a bigger screen/phone size, some people want a smaller one. Some people care a lot about battery while others leave the phone charging the phone daily/nightly without thinking too much about it. 2GHz of processing power on one chip is not completely the same as 2GHz on another chip. One completely arbitrary, first-approximation metric would be battery/resolution/processing (e.g. if the screen is big or the processing power is high then at least the battery should be on par).
Examples:
- for Samsung A71, this metric would be 4500mAh/(2400pxX1800px])/(8coresX2.2GHz)=0.0000591856
- for iPhone XR it would be 2942/(1792pxX828px)/(6coresx2.49GHz)=0.00013271613
One interpretation of these numbers is, at a first glance, that Samsung A71 draws more power from its battery than iPhone XR, but the interpretation of this metric, in case you find it useful, is up to you as is the case with all the other metrics and bullet points on this page. Obviously this metric is not perfect - for example just adding up the processing power for each core doesn't necessarily reflect realistic processing power at a given time. One other disadvantage is that, of course, we are not factoring in the operating system footprint in these formulas... The advantage though is that this metric can be adapted (either simplified or expanded to your needs)
All of the scores/comparisons above can be normalized by price if this is a sufficiently important criterion (and in life it seems that price is important). All that it takes is to divide or multiply the score by price. For example, in the case from point 1, with the hotels, H1 and H2, let's say that the price per night for H1 is P1=$200/night. For H2, let's say P2=$205/night. Now let's compare L1/P1 with L2/P2. L1/P1=0.000125*200=0.025; L2/P2=0.00012*220=0.0264; So even if the speed of "rating loss" for the second is lower (which means H2 is theoretically better), when also weighing price into the loss, the 2nd hotel loses more. But how much weight price has inside this formula is up to you, in the end, it does not have to be linear. Now let's show this also for watches: if S(Watch1)=9.75 and S(Watch2)=9.25 then it means Watch1 is theoretically better than Watch2. But if price is a factor in all of this, then we would have to compare S(Watch1)/Price(Watch1) with S(Watch2)/Price(Watch2). If Price(Watch1)=$35 and Price(Watch2)=$30, then we see that S(Watch1)/Price(Watch1)<S(Watch2)/Price(Watch2) even if S(Watch1)>S(Watch2). Note that in the example with the hotels, our score is better when lower (since it represents loss), therefore we multiplied by price. In the example with watches, we divided by price because the score is better when higher. It depends on what type of score/rating you are actually calculating.
|