spotterbase.dnm package

Submodules

spotterbase.dnm.defaults module

spotterbase.dnm.defaults.get_arxmliv_dnm_factory(*, decorate_replacements: bool = True, number_replacements: bool = True, keep_titles: bool = True, keep_replacements_as_annotations: bool = True, normalize_white_space: bool = True, wrap_replacements_with_spaces: bool = False) DnmFactory
spotterbase.dnm.defaults.get_simple_arxmliv_factory(*, token_prefix: str = '', token_suffix: str = '', keep_titles: bool = True, core_token_processor: Callable[[str], str] = _default_core_token_processor) SimpleDnmFactory
spotterbase.dnm.defaults.whitespace_normalization_post_processing(dnm: Dnm) Dnm

spotterbase.dnm.dnm module

class spotterbase.dnm.dnm.Dnm(*, meta_info: _MetaInfoType, string: str | None = None, start_refs: Sequence[int] | None = None, end_refs: Sequence[int] | None = None, _rel_data: _RelData[_MetaInfoType] | None = None)

Bases: LinkedStr[DnmMeta]

get_embedded_annotations() list[Annotation]
sub_dnm_from_dom_range(dom_range: DomRange) tuple[Dnm, DnmMatchIssues]
sub_dnm_from_ref_range(start_ref: int, end_ref: int) tuple[Dnm, DnmMatchIssues]
to_dom() DomRange
to_dom_offset_range() DomOffsetRange
to_fragment_target(target_uri: str | Uri | URIRef | Path | VocabularyMeta) FragmentTarget
class spotterbase.dnm.dnm.DnmFactory

Bases: ABC

anonymous_dnm_from_node(node: _Element) Dnm
dnm_from_document(document: Document) Dnm
abstract make_dnm_from_meta(dnm_meta: DnmMeta) Dnm
class spotterbase.dnm.dnm.DnmMatchIssues(dom_start_earlier: bool, dom_end_later: bool, dom_start_later: bool, dom_end_earlier: bool)

Bases: object

When mapping a part of the DOM to the DNM, it might not be a perfect match. This class contains more detailed information about the problem.

property dnm_covers_more: bool
property dnm_misses_something: bool
dom_end_earlier: bool
dom_end_later: bool
dom_start_earlier: bool
dom_start_later: bool
class spotterbase.dnm.dnm.DnmMeta(dom: '_Element', offset_converter: 'OffsetConverter', selector_converter: 'SelectorConverter', uri: 'Uri')

Bases: object

dom: _Element
embedded_annotations: EmbeddedAnnotations
offset_converter: OffsetConverter
selector_converter: SelectorConverter
uri: Uri
class spotterbase.dnm.dnm.EmbeddedAnnotations

Bases: object

get_next_replacement_number(category: str) int
insert(replacement: str, range_: DomOffsetRange, annotation: Annotation, replacement_unique: bool = True)

spotterbase.dnm.linked_str module

class spotterbase.dnm.linked_str.LinkedStr(*, meta_info: _MetaInfoType, string: str | None = None, start_refs: Sequence[int] | None = None, end_refs: Sequence[int] | None = None, _rel_data: _RelData[_MetaInfoType] | None = None)

Bases: Generic[_MetaInfoType]

Should be treated as immutable! For optimization, references are used (e.g. when created a sub-linked-str)

char_at(pos: int) str
get_end_ref() int
get_end_refs() Sequence[int]
get_indices_from_ref_range(start_ref, end_ref) tuple[int, int]
get_meta_info() _MetaInfoType
get_start_ref() int
get_start_refs() Sequence[int]
lower() LinkedStr_T
normalize_spaces() LinkedStr_T

replace sequences of whitespaces with a single one.

replacements_at_positions(replacements: Iterable[tuple[int, int, str]], positions_are_references: bool = True) LinkedStr_T
strip() LinkedStr_T
upper() LinkedStr_T
with_string(string: str) LinkedStr_T
class spotterbase.dnm.linked_str.LinkedStr_T

TypeVariable bound for LinkedStr

alias of TypeVar(‘LinkedStr_T’, bound=LinkedStr)

spotterbase.dnm.linked_str.string_to_lstr(string: str) LinkedStr[None]

spotterbase.dnm.node_based_dnm_factory module

class spotterbase.dnm.node_based_dnm_factory.NodeBasedDnmFactory(root_processor: NodeProcessor)

Bases: DnmFactory

make_dnm_from_meta(dnm_meta: DnmMeta) Dnm
class spotterbase.dnm.node_based_dnm_factory.NodeProcessor

Bases: ABC

abstract apply(node: _Element, dnm_meta: DnmMeta) tuple[Iterable[int], Iterable[int], Iterable[str]]
class spotterbase.dnm.node_based_dnm_factory.ReplacingNP(replacement_pattern: ReplacementPattern, category: str, number_replacements: bool = True, keep_annotation: bool = True)

Bases: NodeProcessor

apply(node: _Element, dnm_meta: DnmMeta) tuple[Iterable[int], Iterable[int], Iterable[str]]
class spotterbase.dnm.node_based_dnm_factory.SkippingNP

Bases: NodeProcessor

apply(node: _Element, dnm_meta: DnmMeta) tuple[Iterable[int], Iterable[int], Iterable[str]]
class spotterbase.dnm.node_based_dnm_factory.SourceHtmlNP

Bases: NodeProcessor

Essentially outputs the original HTML sources of the node (but generates it from the DOM). TODO: Currently, attributes are skipped. This should be configurable.

apply(node: _Element, dnm_meta: DnmMeta) tuple[Iterable[int], Iterable[int], Iterable[str]]
class spotterbase.dnm.node_based_dnm_factory.TextExtractingBlockedNP(child_processor: NodeProcessor)

Bases: NodeProcessor

Extracts the text content of a node, but processes the children with a different processor. This can be useful to avoid infinite recursion in some cases.

apply(node: _Element, dnm_meta: DnmMeta) tuple[Iterable[int], Iterable[int], Iterable[str]]
class spotterbase.dnm.node_based_dnm_factory.TextExtractingNP

Bases: NodeProcessor

apply(node: _Element, dnm_meta: DnmMeta) tuple[Iterable[int], Iterable[int], Iterable[str]]
register_class_processor(class_: str, processor: NodeProcessor)
register_tag_processor(tag: str, processor: NodeProcessor)
class spotterbase.dnm.node_based_dnm_factory.TokenAfterNodeNP(token: str, node_processor: NodeProcessor)

Bases: NodeProcessor

apply(node: _Element, dnm_meta: DnmMeta) tuple[Iterable[int], Iterable[int], Iterable[str]]

spotterbase.dnm.post_processing_dnm_factory module

class spotterbase.dnm.post_processing_dnm_factory.PostProcessingDnmFactory(main_factory: DnmFactory, post_processor: Callable[[Dnm], Dnm])

Bases: DnmFactory

make_dnm_from_meta(dnm_meta: DnmMeta) Dnm

spotterbase.dnm.range_subst module

class spotterbase.dnm.range_subst.RangeSubstituter(to_substitute: Iterable[tuple[tuple[int, int], tuple[str, _T]]])

Bases: Generic[_T]

apply(dnm: LinkedStr_T) LinkedStr_T
ordered_ref_ranges: list[tuple[int, int]]
replacement_values: dict[tuple[int, int], str]
substitution_values: dict[tuple[int, int], _T]

spotterbase.dnm.replacement_pattern module

class spotterbase.dnm.replacement_pattern.CategoryStyle

Bases: object

ALL_CAPS() str
CAMEL_CASE() str
HYPHENATED() str
class spotterbase.dnm.replacement_pattern.ReplacementPattern

Bases: ABC

class spotterbase.dnm.replacement_pattern.StandardReplacementPattern(prefix: 'str' = '', infix: 'str' = '', suffix: 'str' = '', include_infix_if_unnumbered: 'bool' = False, include_prefix_suffix_if_unnumbered: 'bool' = True, category_style: 'Callable[[str], str]' = <functools._lru_cache_wrapper object at 0x7eed1124c300>)

Bases: ReplacementPattern

category_style() str
include_infix_if_unnumbered: bool = False
include_prefix_suffix_if_unnumbered: bool = True
infix: str = ''
prefix: str = ''
suffix: str = ''

spotterbase.dnm.simple_dnm_factory module

class spotterbase.dnm.simple_dnm_factory.SimpleDnmFactory(nodes_to_replace: dict[str, str] | None = None, classes_to_replace: dict[str, str] | None = None)

Bases: DnmFactory

make_dnm_from_meta(dnm_meta: DnmMeta) Dnm

spotterbase.dnm.xml_util module

class spotterbase.dnm.xml_util.XmlNode(node: _Element, text_node: Literal['text', 'tail', 'none'] = 'none')

Bases: object

the lxml implementation of text nodes is not very convenient, so we will have wrapper for the needed nodes.

getparent() XmlNode | None
classmethod new(node: _Element | _ElementUnicodeResult) XmlNode
node: _Element
tail: bool = False
text: bool = False
spotterbase.dnm.xml_util.get_node_classes(node: _Element) list[str]

Module contents