spotterbase.dnm package

Submodules

spotterbase.dnm.defaults module

spotterbase.dnm.defaults.get_arxmliv_dnm_factory(*, decorate_replacements: bool = True, number_replacements: bool = True, keep_titles: bool = True, keep_replacements_as_annotations: bool = True, normalize_white_space: bool = True, wrap_replacements_with_spaces: bool = False) → DnmFactory

spotterbase.dnm.defaults.get_simple_arxmliv_factory(*, token_prefix: str = '', token_suffix: str = '', keep_titles: bool = True, core_token_processor: Callable[[str], str] = _default_core_token_processor) → SimpleDnmFactory

spotterbase.dnm.defaults.whitespace_normalization_post_processing(dnm: Dnm) → Dnm

spotterbase.dnm.dnm module

class spotterbase.dnm.dnm.Dnm(*, meta_info: _MetaInfoType, string: str | None = None, start_refs: Sequence[int] | None = None, end_refs: Sequence[int] | None = None, _rel_data: _RelData[_MetaInfoType] | None = None)

Bases: LinkedStr[DnmMeta]

get_embedded_annotations() → list[Annotation]

sub_dnm_from_dom_range(dom_range: DomRange) → tuple[Dnm, DnmMatchIssues]

sub_dnm_from_ref_range(start_ref: int, end_ref: int) → tuple[Dnm, DnmMatchIssues]

to_dom() → DomRange

to_dom_offset_range() → DomOffsetRange

to_fragment_target(target_uri: str | Uri | URIRef | Path | VocabularyMeta) → FragmentTarget

class spotterbase.dnm.dnm.DnmFactory

Bases: ABC

anonymous_dnm_from_node(node: _Element) → Dnm

dnm_from_document(document: Document) → Dnm

abstract make_dnm_from_meta(dnm_meta: DnmMeta) → Dnm

class spotterbase.dnm.dnm.DnmMatchIssues(dom_start_earlier: bool, dom_end_later: bool, dom_start_later: bool, dom_end_earlier: bool)

Bases: object

When mapping a part of the DOM to the DNM, it might not be a perfect match. This class contains more detailed information about the problem.

property dnm_covers_more: bool

property dnm_misses_something: bool

dom_end_earlier: bool

dom_end_later: bool

dom_start_earlier: bool

dom_start_later: bool

class spotterbase.dnm.dnm.DnmMeta(dom: '_Element', offset_converter: 'OffsetConverter', selector_converter: 'SelectorConverter', uri: 'Uri')

Bases: object

dom: _Element

embedded_annotations: EmbeddedAnnotations

offset_converter: OffsetConverter

selector_converter: SelectorConverter

uri: Uri

class spotterbase.dnm.dnm.EmbeddedAnnotations

Bases: object

get_next_replacement_number(category: str) → int

insert(replacement: str, range_: DomOffsetRange, annotation: Annotation, replacement_unique: bool = True)

spotterbase.dnm.linked_str module

class spotterbase.dnm.linked_str.LinkedStr(*, meta_info: _MetaInfoType, string: str | None = None, start_refs: Sequence[int] | None = None, end_refs: Sequence[int] | None = None, _rel_data: _RelData[_MetaInfoType] | None = None)

Bases: Generic[_MetaInfoType]

Should be treated as immutable! For optimization, references are used (e.g. when created a sub-linked-str)

char_at(pos: int) → str

get_end_ref() → int

get_end_refs() → Sequence[int]

get_indices_from_ref_range(start_ref, end_ref) → tuple[int, int]

get_meta_info() → _MetaInfoType

get_start_ref() → int

get_start_refs() → Sequence[int]

lower() → LinkedStr_T

normalize_spaces() → LinkedStr_T: replace sequences of whitespaces with a single one.

replacements_at_positions(replacements: Iterable[tuple[int, int, str]], positions_are_references: bool = True) → LinkedStr_T

strip() → LinkedStr_T

upper() → LinkedStr_T

with_string(string: str) → LinkedStr_T

class spotterbase.dnm.linked_str.LinkedStr_T

TypeVariable bound for LinkedStr

alias of TypeVar(‘LinkedStr_T’, bound=LinkedStr)

spotterbase.dnm.linked_str.string_to_lstr(string: str) → LinkedStr[None]

spotterbase.dnm.node_based_dnm_factory module

class spotterbase.dnm.node_based_dnm_factory.NodeBasedDnmFactory(root_processor: NodeProcessor)

Bases: DnmFactory

make_dnm_from_meta(dnm_meta: DnmMeta) → Dnm

class spotterbase.dnm.node_based_dnm_factory.NodeProcessor

Bases: ABC

abstract apply(node: _Element, dnm_meta: DnmMeta) → tuple[Iterable[int], Iterable[int], Iterable[str]]

class spotterbase.dnm.node_based_dnm_factory.ReplacingNP(replacement_pattern: ReplacementPattern, category: str, number_replacements: bool = True, keep_annotation: bool = True)

Bases: NodeProcessor

apply(node: _Element, dnm_meta: DnmMeta) → tuple[Iterable[int], Iterable[int], Iterable[str]]

class spotterbase.dnm.node_based_dnm_factory.SkippingNP

Bases: NodeProcessor

apply(node: _Element, dnm_meta: DnmMeta) → tuple[Iterable[int], Iterable[int], Iterable[str]]

class spotterbase.dnm.node_based_dnm_factory.SourceHtmlNP

Bases: NodeProcessor

Essentially outputs the original HTML sources of the node (but generates it from the DOM). TODO: Currently, attributes are skipped. This should be configurable.

apply(node: _Element, dnm_meta: DnmMeta) → tuple[Iterable[int], Iterable[int], Iterable[str]]

class spotterbase.dnm.node_based_dnm_factory.TextExtractingBlockedNP(child_processor: NodeProcessor)

Bases: NodeProcessor

Extracts the text content of a node, but processes the children with a different processor. This can be useful to avoid infinite recursion in some cases.

apply(node: _Element, dnm_meta: DnmMeta) → tuple[Iterable[int], Iterable[int], Iterable[str]]

class spotterbase.dnm.node_based_dnm_factory.TextExtractingNP

Bases: NodeProcessor

apply(node: _Element, dnm_meta: DnmMeta) → tuple[Iterable[int], Iterable[int], Iterable[str]]

register_class_processor(class_: str, processor: NodeProcessor)

register_tag_processor(tag: str, processor: NodeProcessor)

class spotterbase.dnm.node_based_dnm_factory.TokenAfterNodeNP(token: str, node_processor: NodeProcessor)

Bases: NodeProcessor

apply(node: _Element, dnm_meta: DnmMeta) → tuple[Iterable[int], Iterable[int], Iterable[str]]

spotterbase.dnm.post_processing_dnm_factory module

class spotterbase.dnm.post_processing_dnm_factory.PostProcessingDnmFactory(main_factory: DnmFactory, post_processor: Callable[[Dnm], Dnm])

Bases: DnmFactory

make_dnm_from_meta(dnm_meta: DnmMeta) → Dnm

spotterbase.dnm.range_subst module

class spotterbase.dnm.range_subst.RangeSubstituter(to_substitute: Iterable[tuple[tuple[int, int], tuple[str, _T]]])

Bases: Generic[_T]

apply(dnm: LinkedStr_T) → LinkedStr_T

ordered_ref_ranges: list[tuple[int, int]]

replacement_values: dict[tuple[int, int], str]

substitution_values: dict[tuple[int, int], _T]

spotterbase.dnm.replacement_pattern module

class spotterbase.dnm.replacement_pattern.CategoryStyle

Bases: object

ALL_CAPS() → str

CAMEL_CASE() → str

HYPHENATED() → str

class spotterbase.dnm.replacement_pattern.ReplacementPattern: Bases: ABC

class spotterbase.dnm.replacement_pattern.StandardReplacementPattern(prefix: 'str' = '', infix: 'str' = '', suffix: 'str' = '', include_infix_if_unnumbered: 'bool' = False, include_prefix_suffix_if_unnumbered: 'bool' = True, category_style: 'Callable[[str], str]' = <functools._lru_cache_wrapper object at 0x7eed1124c300>)

Bases: ReplacementPattern

category_style() → str

include_infix_if_unnumbered: bool = False

include_prefix_suffix_if_unnumbered: bool = True

infix: str = ''

prefix: str = ''

suffix: str = ''

spotterbase.dnm.simple_dnm_factory module

class spotterbase.dnm.simple_dnm_factory.SimpleDnmFactory(nodes_to_replace: dict[str, str] | None = None, classes_to_replace: dict[str, str] | None = None)

Bases: DnmFactory

make_dnm_from_meta(dnm_meta: DnmMeta) → Dnm

spotterbase.dnm.xml_util module

class spotterbase.dnm.xml_util.XmlNode(node: _Element, text_node: Literal['text', 'tail', 'none'] = 'none')

Bases: object

the lxml implementation of text nodes is not very convenient, so we will have wrapper for the needed nodes.

getparent() → XmlNode | None

classmethod new(node: _Element | _ElementUnicodeResult) → XmlNode

node: _Element

tail: bool = False

text: bool = False

spotterbase.dnm.xml_util.get_node_classes(node: _Element) → list[str]

spotterbase.dnm package

Submodules

spotterbase.dnm.defaults module

spotterbase.dnm.dnm module

spotterbase.dnm.linked_str module

spotterbase.dnm.node_based_dnm_factory module

spotterbase.dnm.post_processing_dnm_factory module

spotterbase.dnm.range_subst module

spotterbase.dnm.replacement_pattern module

spotterbase.dnm.simple_dnm_factory module

spotterbase.dnm.xml_util module

Module contents