You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Im working with document layout analysis and was searching for techniques to resolve overlapping / concurrent layout classes in a same region. I started to investigate the approach you guys used to attack this issue and found the function that follows.
Im curious about how you guys came up with this approach. The area/confidence base cases seems pretty reasonably but the item vs text and code vs everyone specifics made me curious.
Also about the threshold values how did you guys came up with them? Empirical tests? If so what criteria you guys used to evaluate the best decisions?
def _should_prefer_cluster(
self, candidate: Cluster, other: Cluster, params: dict
) -> bool:
"""Determine if candidate cluster should be preferred over other cluster based on rules.
Returns True if candidate should be preferred, False if not."""
# Rule 1: LIST_ITEM vs TEXT
if (
candidate.label == DocItemLabel.LIST_ITEM
and other.label == DocItemLabel.TEXT
):
# Check if areas are similar (within 20% of each other)
area_ratio = candidate.bbox.area() / other.bbox.area()
area_similarity = abs(1 - area_ratio) < 0.2
if area_similarity:
return True
# Rule 2: CODE vs others
if candidate.label == DocItemLabel.CODE:
# Calculate how much of the other cluster is contained within the CODE cluster
overlap = other.bbox.intersection_area_with(candidate.bbox)
containment = overlap / other.bbox.area()
if containment > 0.8: # other is 80% contained within CODE
return True
# If no label-based rules matched, fall back to area/confidence thresholds
area_ratio = candidate.bbox.area() / other.bbox.area()
conf_diff = other.confidence - candidate.confidence
if (
area_ratio <= params["area_threshold"]
and conf_diff > params["conf_threshold"]
):
return False
return True # Default to keeping candidate if no rules triggered rejection
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Im working with document layout analysis and was searching for techniques to resolve overlapping / concurrent layout classes in a same region. I started to investigate the approach you guys used to attack this issue and found the function that follows.
Im curious about how you guys came up with this approach. The area/confidence base cases seems pretty reasonably but the item vs text and code vs everyone specifics made me curious.
Also about the threshold values how did you guys came up with them? Empirical tests? If so what criteria you guys used to evaluate the best decisions?
Beta Was this translation helpful? Give feedback.
All reactions