Data scraping: Phone dataset from Versus.com

A beautiful sight

Introduction

Versus.com is a website that provides services on comparing various aspects of life, from tech devices, foods, to universities and football teams.

This Phone dataset is obtained from scraping Versus.com. Without that website, this dataset wouldn’t have existed.

Format

1 csv file with 3975 rows and 75 columns.

Download

Download from my public repository here (removed per request from versus.com).

Demonstrations

Correlation of some main properties:

Specs Confusion Matrix
Confusion matrix of correlation.

A closer look at the relationship between weight and battery power:

Weight Vs Battery Power 1
Weight and Battery power

Battery capacity histogram:

Battery Histogram

3000 mAh is the most frequent battery power capacity in the market, followed by 4000 and then 2000 mAh. A well-rounded number seems to be the preference of manufacturers.

On the other side, there are several phones with more than 10000 mAh, with the biggest one being 18000 mAh.

Column description

id | an unique identifier for each phone.

name | phone name.

antutu_benchmark_score | AnTuTu is one of the most important benchmarks for Android devices. The score reflects the overall performance of a device by summing up results of individual tests that various parameters, such as RAM speed, CPU performance, 2D & 3D graphics performance.

battery_life | The device’s battery life (when in use) as given by the manufacturer. With longer battery life, you have to charge the device less often.

battery_power | Battery power, also called battery capacity, represents the amount of electrical energy stored in the battery. Measured in milliampere-hours (mAh), it indicates how much electric power can be used over time. More battery power translates into longer battery life.

bits_executed_at_a_time | NEON provides acceleration for media processing, such as listening to MP3s.

bluetooth_version | Bluetooth is a wireless technology standard that allows data transfers between devices placed in close proximity, using short-wavelength, ultra-high frequency radio waves. Newer versions provide faster data transfers.

body_sar__us | SAR (Specific Absorption Rate) describes how much radiofrequency energy emitted by the device will be absorbed by your body. The rate is measured at the hip level. The legal limit is 1.6 W/kg in the U.S.

charge_time | The time it takes to fully charge the battery.

continuous_shooting_at_high_resolution | Fast continuous shooting is useful for catching action shots.

contrast_ratio | Contrast ratio is the visual distance between the lightest and the darkest colors that may be reproduced on the display. A high contrast ratio is desired, resulting in richer dark colors and more distinctive color gradation.

cpu_speed | The CPU speed indicates how many processing cycles per second can be executed by a CPU, considering all of its cores (processing units). It is calculated by adding the clock rates of each core or, in the case of multi-core processors employing different microarchitectures, of each group of cores.

cpu_threads | More threads result in faster performance and better multitasking.

directx_version | DirectX is used in games, with newer versions supporting better graphics.

download_speed | The download speed is a measurement of the internet connection bandwidth, representing the maximum data transfer rate at which a device can access online content.

emojis_available | Devices support emoji characters differently, depending on the operating system version. With more emojis available, users can convey a broad range of emotions in their messages in a fun, light-hearted way.

exposure_time | When the camera shutter is open for a long time, more light is absorbed by the sensor. You can use this slow shutter speed to produce vivid night time photography, or capture the trajectory of moving objects such as motorway traffic.

field_of_view | A wider field of view assures an immersive and realistic experience.

floating_point_performance | Floating-point performance is a measurement of the raw processing power of the GPU.

front_end_width | The CPU can decode more instructions per clock (IPC), meaning that the CPU performs better

geekbench_result | This is a cross-platform benchmark that measures the performance of the CPU. (Source: Primate Labs, 2020)

gorilla_glass_version | Gorilla Glass is one of the most popular brands of chemically strengthened glass, manufactured by Corning. Several versions have been developed, the newer ones being more durable and providing better damage resistance.

gpu_clock_speed | The graphics processing unit (GPU) has a higher clock speed.

gpu_memory_speed | The memory clock speed is one aspect that determines the memory bandwidth.

head_sar__eu | SAR (Specific Absorption Rate) describes how much radiofrequency energy emitted by the device will be absorbed by your body. The rate is measured at the head level. The legal limit is 2.0 W/kg in the EU.

head_sar__us | SAR (Specific Absorption Rate) describes how much radiofrequency energy emitted by the device will be absorbed by your body. The rate is measured at the head level. The legal limit is 1.6 W/kg in the U.S.

height | The height represents the vertical dimension of the product. We consider a smaller height better because it assures easy maneuverability.

included_sd_card__memory_size | Some manufacturers include a memory card in the box. An SD card with a higher storage capacity allows you to extend the storage space on your device with ease, providing you with more space for your data.

ingress_protection__ip__rating | The Ingress Protection (IP) rating classifies the degree of protection against dust and water. Higher ratings are better. For example, a device rated IP68 can resist immersion into water for a certain time, specified by the manufacturer.

internal_storage | The internal storage, also called read-only memory (ROM), refers to the built-in storage space available in a device for system data, apps, and user-generated data. With a large amount of internal storage, you can save more files and apps on your device.

lowest_potential_operating_temperature | The minimum temperature at which the device can perform to the optimal level.

luminance | Luminance is the intensity of light that a device emits.

maximum_amount_of_external_memory_supported | The maximum amount of external storage memory supported by the device.

maximum_focal_length | A longer maximum focal length allows you to focus in on a small part of a scene and offers a narrower angle of view than shorter focal lengths.

maximum_light_sensitivity | With a higher light sensitivity (ISO level), the sensor absorbs more light. This can be used to capture moving objects using fast shutter speed or to take images in low light without using a flash.

maximum_memory_bandwidth | This is the maximum rate that data can be read from or stored into memory.

maximum_operating_temperature | The maximum temperature at which the device can perform to the optimal level.

megapixels__front_camera | The number of megapixels determines the resolution of the images captured with the front camera. A higher megapixel count means that the front camera is capable of capturing more details, an essential factor for taking high-resolution selfies.

megapixels__main_camera | The number of megapixels determines the resolution of the images captured with the main camera. A higher megapixel count means that the camera is capable of capturing more details. However, the megapixel count is not the only important element determining the quality of an image.

memory_bandwidth | A higher memory bandwidth means the memory can be accessed faster and therefore data can be retrieved quicker, having a positive effect on the performance.

minimum_focal_length | A shorter minimum focal length allows you to get more of the scene in the photo and offers a wider angle of view than longer focal lengths.

more_languages_supported | The number of languages in which the operating system is available. Operating systems with a broader range of supported languages provide an intuitive interface for users around the world.

movie_bitrate | The higher the movie recording bitrate the better the movie quality with more and crispier details and fewer compression artifacts.

number_of_flash_leds | Multi-LED camera flashes use LED lights that have different color temperatures (warm light and cool light), improving the color balance based on the conditions in which the photos are taken.

number_of_microphones | More microphones result in better sound quality and enable the device to filter out background noise.

opencl_version | Some apps use OpenCL to apply the power of the graphics processing unit (GPU) for non-graphical computing. Newer versions introduce more functionality and better performance.

opengl_es_version | OpenGL ES is used for games on mobile devices such as smartphones. Newer versions support better graphics.

opengl_version | OpenGL is used in games, with newer versions supporting better graphics.

openvg_version | OpenVG is used to improve the rendering of 2D graphics on mobile devices, for example, the user interface (UI) on a smartphone.

optical_zoom | The zoom range is the ratio between the longest and shortest focal lengths. A higher zoom range means that the lens is more versatile.

overall_score | The Versus score measures the overall performance, on a scale from 1 to 100. The score is calculated taking into account all properties, giving a higher weight to the properties that our users vote for as highly relevant.

pixel_density | Pixel density is a measurement of a screen’s resolution, expressed as the number of pixels per inch (PPI) on the screen. A higher pixel density translates into more clarity and sharpness for the images rendered on the screen, thus improving the quality of the viewing experience.

pixel_rate | The number of pixels that can be rendered to the screen every second.

pixel_size__main_camera | The pixel size measures the length of the individual photodetectors (pixels) in the image sensor. Larger pixels can capture more light, providing better image quality, and improved low-light performance.

power_on_delay | It takes less time for the camera to turn on and take a first picture.

ram | Random-access memory (RAM) is a form of volatile memory used to store working data and machine code currently in use. It is a quick-access, temporary virtual storage that can be read and changed in any order, thus enabling fast data processing.

ram_speed | It can support faster memory, which will give quicker system performance.

refresh_rate | The frequency at which the screen is refreshed. Higher frequency results in less flickering (less noise) and more natural movement representation in action-intense scenes.

resolution | Resolution is an essential indicator of a screen’s image quality, representing the maximum amount of pixels that can be shown on the screen. The resolution is given as a compound value, comprised of horizontal and vertical pixels.

response_time | Response time is how long it takes for a display to change the state of pixels, in order to show new content. The less time it takes to respond, the less likely it is to blur fast-changing images.

screen_size | The bigger the screen size is, the better the user experience.

semiconductor_size | Small semiconductors provide better performance and reduced power consumption. Chipsets with a higher number of transistors, semiconductor components of electronic devices, offer more computational power. A small form factor allows more transistors to fit on a chip, therefore increasing its performance.

shutter_lag | The amount of time it takes the camera to take a photo, without having to focus.

thickness | We consider a thinner chassis better because it makes the product more compact and portable. Thinness is a feature highlighted by many manufacturers of mobile devices, but it is essential for a wide range of products.

upload_speed | The upload speed is a measurement of the internet connection bandwidth, representing the maximum data transfer rate at which a device can send information to a server or another device.

usb_version | Newer USB versions are faster and have better power management.

vfp_version | Vector Floating-Point (VFP) is used by the processor to deliver increased performance in areas such as digital imaging.

video_recording__main_camera | The maximum resolution available for videos shot with the main camera. Although it may be possible to choose among other frame rates, those recordings usually have lower resolutions.

volume | Volume is the quantity of three-dimensional space enclosed by the product’s chassis or, in simpler terms, the space the product occupies.

warranty_period | When covered under the manufacturer’s warranty it is possible to get a replacement in the case of a malfunction.

waterproof_depth_rating | The waterproof depth rating indicates how well the device is protected against water ingress and water pressure. Devices that can withstand more water pressure are better for swimming or diving.

weight | We consider a lower weight better because lighter devices are more comfortable to carry. A lower weight is also an advantage for home appliances, as it makes transportation easier, and for many other types of products.

wide_aperture__front_camera | With a wider aperture, the sensor can capture more light, helping to avoid blur by enabling a faster shutter speed. It also provides a shallow depth of field, allowing you to blur the background and focus attention on the subject.

wide_aperture__main_camera | With a wider aperture the sensor can capture more light, helping to avoid blur by enabling a faster shutter speed. It also provides a shallow depth of field, allowing you to blur the background to focus attention on the subject.

width | The width represents the horizontal dimension of the product. We consider a smaller width better because it assures easy maneuverability.

Leave a Reply