{"id":80,"date":"2024-11-10T17:01:15","date_gmt":"2024-11-10T17:01:15","guid":{"rendered":"https:\/\/akosgombkoto.info\/?page_id=80"},"modified":"2025-02-05T09:42:31","modified_gmt":"2025-02-05T09:42:31","slug":"dbscan","status":"publish","type":"page","link":"https:\/\/akosgombkoto.info\/en\/dbscan\/","title":{"rendered":"DBSCAN"},"content":{"rendered":"<p>[et_pb_section fb_built=&#8221;1&#8243; admin_label=&#8221;Header&#8221; _builder_version=&#8221;4.27.3&#8243; _module_preset=&#8221;default&#8221; custom_padding=&#8221;0px||0px||false|false&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_row column_structure=&#8221;1_2,1_2&#8243; _builder_version=&#8221;4.27.3&#8243; _module_preset=&#8221;default&#8221; custom_padding=&#8221;0px|||||&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_column type=&#8221;1_2&#8243; _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_image src=&#8221;https:\/\/akosgombkoto.info\/wp-content\/uploads\/2025\/01\/data-science-070-2.png&#8221; title_text=&#8221;data-science-070-2&#8243; _builder_version=&#8221;4.27.4&#8243; _module_preset=&#8221;default&#8221; max_width_tablet=&#8221;500px&#8221; max_width_phone=&#8221;220px&#8221; max_width_last_edited=&#8221;off|tablet&#8221; max_height_tablet=&#8221;200px&#8221; max_height_phone=&#8221;100px&#8221; max_height_last_edited=&#8221;on|phone&#8221; custom_margin=&#8221;|||-8vw|false|false&#8221; custom_margin_tablet=&#8221;|||0vw|false|false&#8221; custom_margin_phone=&#8221;&#8221; custom_margin_last_edited=&#8221;on|tablet&#8221; global_colors_info=&#8221;{}&#8221;][\/et_pb_image][\/et_pb_column][et_pb_column type=&#8221;1_2&#8243; _builder_version=&#8221;4.27.3&#8243; _module_preset=&#8221;default&#8221; custom_padding=&#8221;||||false|false&#8221; custom_padding_tablet=&#8221;0px||||false|false&#8221; custom_padding_last_edited=&#8221;off|desktop&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_text _builder_version=&#8221;4.27.4&#8243; _module_preset=&#8221;d72c0383-6487-4f2c-ac5c-7d48a6376757&#8243; header_font=&#8221;Roboto Slab||||||||&#8221; header_text_color=&#8221;#000000&#8243; header_font_size=&#8221;80px&#8221; header_line_height=&#8221;1.2em&#8221; custom_margin=&#8221;||10px||false|false&#8221; header_font_size_tablet=&#8221;40px&#8221; header_font_size_phone=&#8221;20px&#8221; header_font_size_last_edited=&#8221;on|phone&#8221; locked=&#8221;off&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<h1>DBSCAN<\/h1>\n<p>[\/et_pb_text][et_pb_text _builder_version=&#8221;4.27.4&#8243; _module_preset=&#8221;default&#8221; text_font=&#8221;Roboto Mono||||||||&#8221; text_text_color=&#8221;#fcb03a&#8221; text_font_size=&#8221;18px&#8221; text_line_height=&#8221;1.8em&#8221; background_color=&#8221;#042f4f&#8221; custom_padding=&#8221;15px||15px|20px|true|false&#8221; global_module=&#8221;471&#8243; saved_tabs=&#8221;all&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p><strong>Scroll down to the first section for industrial and e.g. HR application examples.<\/strong><\/p>\n<p>[\/et_pb_text][et_pb_video src=&#8221;https:\/\/akosgombkoto.info\/wp-content\/uploads\/2024\/11\/dbscan.mp4&#8243; _builder_version=&#8221;4.27.4&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][\/et_pb_video][et_pb_video src=&#8221;https:\/\/akosgombkoto.info\/wp-content\/uploads\/2025\/02\/dbscan-smiley.mp4&#8243; _builder_version=&#8221;4.27.4&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][\/et_pb_video][\/et_pb_column][\/et_pb_row][\/et_pb_section][et_pb_section fb_built=&#8221;1&#8243; admin_label=&#8221;Blog&#8221; _builder_version=&#8221;4.27.3&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_row _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_text _builder_version=&#8221;4.27.3&#8243; _module_preset=&#8221;default&#8221; text_font=&#8221;Roboto Mono||||||||&#8221; text_text_color=&#8221;#fcb03a&#8221; text_font_size=&#8221;18px&#8221; text_line_height=&#8221;1.8em&#8221; background_color=&#8221;#042f4f&#8221; custom_padding=&#8221;15px||15px|20px|true|false&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p><strong>Industrial and e.g. HR Application Examples\n(Non-industrial examples, such as HR, are located below the industrial example block.):<\/strong><\/p>\n<p>[\/et_pb_text][et_pb_text _builder_version=&#8221;4.27.4&#8243; _module_preset=&#8221;default&#8221; text_font=&#8221;Roboto Mono||||||||&#8221; text_text_color=&#8221;#ffffff&#8221; text_font_size=&#8221;16px&#8221; text_line_height=&#8221;1.8em&#8221; background_color=&#8221;#042f4f&#8221; custom_padding=&#8221;20px|20px|20px|20px|true|true&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p style=\"text-align: justify;\">Industrial Example:<\/p>\n<p style=\"text-align: justify;\" class=\"translation-block\">Monitoring Machine Conditions\nIn an automobile factory, various machines are used in the production process, such as welding machines, painting robots, and assembly lines. Sensors are installed on each machine to measure parameters like temperature, vibration, pressure, and other indicators that help determine the machine's operational state.<\/p>\n<p style=\"text-align: justify;\">How DBSCAN is Applied:<br \/>Data Collection\nReal-time data is collected from various machines in the factory (e.g., temperature, vibrations, operating hours, etc.).<\/p>\n<p style=\"text-align: justify;\">Using DBSCAN:<\/p>\n<p style=\"text-align: justify;\" class=\"translation-block\">The DBSCAN algorithm clusters (groups) the data, categorizing machines with similar operational parameters into the same group.\nMachines of the same type that fall outside the normal operational range (e.g., unusually high temperature or vibration) are identified as \"noise\" by the algorithm and can generate alerts for the maintenance department.<\/p>\n<p style=\"text-align: justify;\">Anomaly Detection:<\/p>\n<p style=\"text-align: justify;\">If one machine's parameters differ significantly from others of the same type (e.g., a painting robot overheating), DBSCAN detects the outlier and signals potential failure.<\/p>\n<p style=\"text-align: justify;\">Repairs and Optimization:<\/p>\n<p style=\"text-align: justify;\">The maintenance team is notified immediately about the problematic machine before it breaks down completely. This minimizes unplanned downtime, improves machine availability, and optimizes maintenance costs.<\/p>\n<p style=\"text-align: justify;\">[\/et_pb_text][et_pb_text _builder_version=&#8221;4.27.3&#8243; _module_preset=&#8221;default&#8221; text_font=&#8221;Roboto Mono||||||||&#8221; text_text_color=&#8221;#ffffff&#8221; text_font_size=&#8221;16px&#8221; text_line_height=&#8221;1.8em&#8221; background_color=&#8221;#042f4f&#8221; custom_padding=&#8221;20px|20px|20px|20px|true|true&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p style=\"text-align: justify;\">HR Example:<\/p>\n<p style=\"text-align: justify;\">1. Identifying Recruitment Patterns<\/p>\n<p style=\"text-align: justify;\">To optimise recruitment processes, the DBSCAN algorithm can be used to analyse historical data on new hires. For instance, the algorithm might identify which profiles or skill combinations have been most successful for specific roles, thereby refining the recruitment process.<\/p>\n<p style=\"text-align: justify;\">2. Predicting Employee Turnover:<\/p>\n<p style=\"text-align: justify;\">Retaining employees is a key focus, and analysing turnover patterns can help. DBSCAN can identify trends associated with employee departures, such as lack of engagement, frequent absences, or low performance. This helps identify at-risk groups, allowing HR to take proactive steps to improve retention.<\/p>\n<p style=\"text-align: justify;\">3. Clustering Based on Performance Evaluation:<\/p>\n<p style=\"text-align: justify;\">HR databases often contain various performance metrics (e.g., number of projects, adherence to deadlines, etc.). Using DBSCAN, employees can be grouped by performance levels. The algorithm automatically detects performance patterns, highlighting exceptional or underperforming employees, which enables targeted development initiatives or reward systems.<\/p>\n<p>[\/et_pb_text][et_pb_toggle title=&#8221;A DBSCAN klaszterez\u00e9s anim\u00e1lva: (leny\u00edl\u00f3 tartalom &#8211; kattints a jobb sz\u00e9ls\u0151 + ikonra)&#8221; open_toggle_text_color=&#8221;#fcb03a&#8221; open_toggle_background_color=&#8221;#042f4f&#8221; closed_toggle_text_color=&#8221;#fcb03a&#8221; closed_toggle_background_color=&#8221;#042f4f&#8221; icon_color=&#8221;#fcb03a&#8221; open_icon_color=&#8221;#fcb03a&#8221; _builder_version=&#8221;4.27.4&#8243; _module_preset=&#8221;default&#8221; title_font_size=&#8221;18px&#8221; closed_title_font_size=&#8221;18px&#8221; body_text_color=&#8221;#FFFFFF&#8221; body_font_size=&#8221;16px&#8221; body_line_height=&#8221;1.8em&#8221; hover_enabled=&#8221;0&#8243; global_colors_info=&#8221;{}&#8221; sticky_enabled=&#8221;0&#8243;]<\/p>\n<p style=\"text-align: justify;\"><span>In these videos, we see the DBSCAN unsupervised clustering algorithm in action.<\/span><\/p>\n<p><span>We observe how different clusters emerge point by point, without a predefined number of clusters. Additionally, outliers (points that do not belong to any cluster) are marked in a separate color, demonstrating anomaly detection.<\/span><\/p>\n<p><span>In the first case, the code randomly generates 300 points on a 100x100 two-dimensional plane (with a unit distance of 1) and clusters them using an Epsilon parameter (neighborhood radius) of 7.3 units and a minimum of 4 neighbors.<\/span><\/p>\n<p><span>In the second case, the code randomly generates a sad face on a 500x500 two-dimensional plane (with a unit distance of 1) and clusters the points using an Epsilon parameter of 65 units and a minimum of 3 neighbors.<\/span><\/p>\n<p>[\/et_pb_toggle][et_pb_toggle title=&#8221;Magyar\u00e1zat: (leny\u00edl\u00f3 tartalom &#8211; kattints a jobb sz\u00e9ls\u0151 + ikonra)&#8221; open_toggle_text_color=&#8221;#fcb03a&#8221; open_toggle_background_color=&#8221;#042f4f&#8221; closed_toggle_text_color=&#8221;#fcb03a&#8221; closed_toggle_background_color=&#8221;#042f4f&#8221; icon_color=&#8221;#fcb03a&#8221; open_icon_color=&#8221;#fcb03a&#8221; _builder_version=&#8221;4.27.3&#8243; _module_preset=&#8221;default&#8221; title_font_size=&#8221;18px&#8221; closed_title_font_size=&#8221;18px&#8221; body_text_color=&#8221;#FFFFFF&#8221; body_font_size=&#8221;16px&#8221; body_line_height=&#8221;1.8em&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p style=\"text-align: justify;\">DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a density-based clustering algorithm that groups data into clusters without requiring the number of clusters to be predefined. This algorithm is particularly effective in handling noise (outliers) and data that follows density patterns.<\/p>\n<p style=\"text-align: justify;\">How DBSCAN Works<br \/>The DBSCAN algorithm uses the following key concepts:<\/p>\n<p style=\"text-align: justify;\">Data Point Density: DBSCAN examines the neighborhood (adjacent data points) and uses two fundamental parameters:<\/p>\n<p style=\"text-align: justify;\">Eps (\u03b5): The radius within which a data point is considered a neighbor of another data point.<br \/>MinPts: The minimum number of points within the \u03b5-radius for a point to qualify as a core point.<\/p>\n<p style=\"text-align: justify;\">The three types of data points:<\/p>\n<p style=\"text-align: justify;\">Core Points: Points with at least MinPts neighbors within the \u03b5-radius.<br \/>Border Points: Points within the \u03b5-radius of a core point but with fewer than MinPts neighbors.<br \/>Noise Points (Outliers): Points outside the \u03b5-radius of all other points, not belonging to any cluster.<\/p>\n<p style=\"text-align: justify;\">Clustering Process:<\/p>\n<p style=\"text-align: justify;\">DBSCAN selects an arbitrary data point, and if it has more than MinPts neighbors within the epsilon radius, it starts forming a cluster.<br \/>Clusters are expanded by adding all neighboring points.<br \/>Data points that do not belong to any cluster are considered noise points (outliers).<\/p>\n<p style=\"text-align: justify;\">Applications<br \/>Density-Based Clustering:\nDBSCAN is highly effective for irregularly shaped clusters (not just spherical) and data containing noise or outliers. No need to predetermine the number of clusters.<\/p>\n<p style=\"text-align: justify;\">Key Application Areas:<\/p>\n<p style=\"text-align: justify;\">Geographic Data Analysis:\nDBSCAN can be used for clustering spatial data, such as identifying cities, restaurants, or stores based on their location.<\/p>\n<p style=\"text-align: justify;\">Anomaly Detection:\nUseful for identifying noise points or outliers in datasets, which can represent unusual events or erroneous data.<\/p>\n<p style=\"text-align: justify;\">Image Processing:\nApplicable for grouping objects in images or identifying specific visual patterns.<\/p>\n<p style=\"text-align: justify;\">Network Data Mining:\nIn internet or transportation networks, DBSCAN is valuable for analyzing the density of routes, nodes, and the connections between them. For example, it can be used to identify densely connected pathways, critical intersections, or areas with significant activity within the network.<\/p>\n<p style=\"text-align: justify;\">Advantages:<\/p>\n<p style=\"text-align: justify;\" class=\"translation-block\">Finds clusters without predefined cluster numbers.\nEffectively handles noise and outliers.\nDetects clusters of various shapes (not limited to spherical ones).<\/p>\n<p style=\"text-align: justify;\">Disadvantages:<\/p>\n<p style=\"text-align: justify;\">Sensitive to parameter selection, especially Eps and MinPts.<br \/>Can be slow with large, dense datasets.<\/p>\n<p style=\"text-align: justify;\">Example<\/p>\n<p style=\"text-align: justify;\">Suppose we are working with traffic data for a city and want to identify areas with high traffic density. Using DBSCAN, we can cluster the traffic data to pinpoint crowded, high-traffic areas (such as central intersections), while treating less-used roads or noisy data (e.g., faulty GPS coordinates) as outliers.<\/p>\n<p style=\"text-align: justify;\">Overall, DBSCAN is a powerful tool for density-based clustering, especially in cases where the number of clusters is unknown and the data contains noise points or outliers.<\/p>\n<p>[\/et_pb_toggle][et_pb_toggle title=&#8221;K\u00f3dmag: (leny\u00edl\u00f3 tartalom &#8211; kattints a jobb sz\u00e9ls\u0151 + ikonra)&#8221; open_toggle_text_color=&#8221;#fcb03a&#8221; open_toggle_background_color=&#8221;#042f4f&#8221; closed_toggle_text_color=&#8221;#fcb03a&#8221; closed_toggle_background_color=&#8221;#042f4f&#8221; icon_color=&#8221;#fcb03a&#8221; open_icon_color=&#8221;#fcb03a&#8221; _builder_version=&#8221;4.27.3&#8243; _module_preset=&#8221;default&#8221; title_font_size=&#8221;18px&#8221; closed_title_font_size=&#8221;18px&#8221; body_text_color=&#8221;#FFFFFF&#8221; body_font_size=&#8221;16px&#8221; body_line_height=&#8221;1.8em&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<pre>import numpy as np\n\n# T\u00e1vols\u00e1g sz\u00e1m\u00edt\u00e1sa k\u00e9t pont k\u00f6z\u00f6tt (euklideszi t\u00e1vols\u00e1g)\n\ndef euclidean_distance(point1, point2):\n\nreturn np.sqrt(np.sum((point1 - point2) ** 2))\n\n# Szomsz\u00e9dok keres\u00e9se (Eps t\u00e1vols\u00e1gon bel\u00fcli pontok)\n\ndef region_query(data, point_idx, eps):\n\nneighbors = []\n\nfor i in range(len(data)):\n\nif euclidean_distance(data[point_idx], data[i]) &lt;= eps:\n\nneighbors.append(i)\n\nreturn neighbors\n\n# Klaszter kib\u0151v\u00edt\u00e9se\n\ndef expand_cluster(data, labels, point_idx, neighbors, cluster_id, eps, min_pts, visited):\n\nlabels[point_idx] = cluster_id\n\ni = 0\n\nwhile i &lt; len(neighbors): neighbor_idx = neighbors[i] if not visited[neighbor_idx]: visited[neighbor_idx] = True new_neighbors = region_query(data, neighbor_idx, eps) if len(new_neighbors) &gt;= min_pts:\n\nneighbors.extend(new_neighbors)\n\nif labels[neighbor_idx] == -1:\n\nlabels[neighbor_idx] = cluster_id\n\ni += 1\n\nreturn labels\n\n# DBSCAN algoritmus\n\ndef dbscan(data, eps, min_pts):\n\nlabels = np.full(len(data), -1)\n\nvisited = np.full(len(data), False)\n\ncluster_id = 0\n\nfor i in range(len(data)):\n\nif visited[i]:\n\ncontinue\n\nvisited[i] = True\n\nneighbors = region_query(data, i, eps)\n\nif len(neighbors) &lt; min_pts:\n\nlabels[i] = -1 # Zajpont\n\nelse:\n\ncluster_id += 1\n\nlabels = expand_cluster(data, labels, i, neighbors, cluster_id, eps, min_pts, visited)\n\nreturn labels\n\n# Pontok gener\u00e1l\u00e1sa\n\nnp.random.seed(42)\n\ndata = np.random.rand(300, 2) * 100 # 300 pont k\u00e9t dimenzi\u00f3ban, 0 \u00e9s 100 k\u00f6z\u00f6tt\n\n# Param\u00e9terek be\u00e1ll\u00edt\u00e1sa\n\neps = 7.3 # Sug\u00e1r\n\nmin_pts = 4 # Minim\u00e1lis szomsz\u00e9dok sz\u00e1ma\n\n# DBSCAN alkalmaz\u00e1sa az adatk\u00e9szletre\n\nlabels = dbscan(data, eps, min_pts)<\/pre>\n<p><code><\/code><\/p>\n<p><code><\/code><\/p>\n<p>[\/et_pb_toggle][\/et_pb_column][\/et_pb_row][\/et_pb_section]<\/p>","protected":false},"excerpt":{"rendered":"<p>DBSCANAz ipari \u00e9s p\u00e9ld\u00e1ul HR-es alkalmaz\u00e1si p\u00e9ld\u00e1\u00e9rt g\u00f6rgess lejjebb az els\u0151 bekezd\u00e9sre. Ipari \u00e9s p\u00e9ld\u00e1ul HR-es alkalmaz\u00e1si p\u00e9ld\u00e1k (nem ipari p\u00e9ld\u00e1k mint p\u00e9ld\u00e1ul a HR-es az ipari p\u00e9lda blokk alatt tal\u00e1lhat\u00f3):Ipari p\u00e9lda: G\u00e9pek \u00e1llapot\u00e1nak figyel\u00e9seEgy aut\u00f3gy\u00e1rban k\u00fcl\u00f6nb\u00f6z\u0151 g\u00e9pek dolgoznak a termel\u00e9si folyamat sor\u00e1n, p\u00e9ld\u00e1ul hegeszt\u0151g\u00e9pek, fest\u0151robotok \u00e9s \u00f6sszeszerel\u0151 sorok. Minden g\u00e9phez szenzorokat telep\u00edtenek, amelyek m\u00e9rik [&hellip;]<\/p>","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_et_pb_use_builder":"on","_et_pb_old_content":"","_et_gb_content_width":"","footnotes":""},"class_list":["post-80","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/akosgombkoto.info\/en\/wp-json\/wp\/v2\/pages\/80","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/akosgombkoto.info\/en\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/akosgombkoto.info\/en\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/akosgombkoto.info\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/akosgombkoto.info\/en\/wp-json\/wp\/v2\/comments?post=80"}],"version-history":[{"count":58,"href":"https:\/\/akosgombkoto.info\/en\/wp-json\/wp\/v2\/pages\/80\/revisions"}],"predecessor-version":[{"id":776,"href":"https:\/\/akosgombkoto.info\/en\/wp-json\/wp\/v2\/pages\/80\/revisions\/776"}],"wp:attachment":[{"href":"https:\/\/akosgombkoto.info\/en\/wp-json\/wp\/v2\/media?parent=80"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}